Banja Lab / Benchmarks / Test
The same task, run on 27 models. Compare the outputs side by side, or open any one in a popup to inspect it.
Top result: claude-opus-4-8 (low reasoning) at 100.0% composite. Lowest: deepseek-v4-flash at 100.0%. 27 models compared on this task.
Draw a pipeline flow diagram in SVG using a 240x170 viewBox. Write it to `graph.svg`. There are four nodes, each a `<circle>` with the listed id, and four directed edges, each a `<line>` carrying class="edge": Nodes (id : label): - node-start : Start - node-parse : Parse - node-build : Build - node-test : Test Edges (exactly four `<line class="edge">` elements): - Start -> Parse - Parse -> Build - Parse -> Test - Build -> Test Layout rules: - Start is to the left of Parse; Parse is to the left of Build. - Build is above Test (Build has the smaller y). - No two node circles overlap. - Put each node's label in a `<text>` element (Start, Parse, Build, Test). - Vector primitives only: no raster images, no data: URIs, no base64, no <foreignObject>, no <script>, no external references.