Banja Lab / Benchmarks / Test
The same task, run on 27 models. Compare the outputs side by side, or open any one in a popup to inspect it.
Top result: claude-opus-4-8 (low reasoning) at 100.0% composite. Lowest: deepseek-v4-flash at 100.0%. 27 models compared on this task.
Draw an organisation chart in SVG using a 200x170 viewBox. Write it to `chart.svg`. There are four role boxes, each a `<rect class="box">` with the listed id and label, and three reporting edges, each a `<line class="edge">`: Boxes (id : label): - box-ceo : CEO - box-eng : Eng - box-sales : Sales - box-lead : Lead Edges (exactly three `<line class="edge">` elements): - CEO -> Eng - CEO -> Sales - Eng -> Lead Layout rules: - CEO is above Eng and above Sales (CEO has the smallest y). - Eng is to the left of Sales. - Eng is above Lead. - No two boxes overlap. - Put each role label (CEO, Eng, Sales, Lead) in a `<text>` element. - Vector primitives only: no raster images, no data: URIs, no base64, no <foreignObject>, no <script>, no external references.