Banja Lab / Benchmarks / Test

DIAGR-0005SVG and graphics · hard

Gantt chart with proportional start offsets and durations

The same task, run on 27 models. Compare the outputs side by side, or open any one in a popup to inspect it.

Top result: claude-opus-4-8 (low reasoning) at 100.0% composite. Lowest: deepseek-v4-flash at 100.0%. 27 models compared on this task.

How it ran

Each model was given the brief below in a fresh, isolated session with no access to our tools, and returned its answer from scratch.
The rendered output was scored 1 to 5 on brief fidelity, visual design, craft, and impact by a four-family vision panel - Anthropic (Claude Opus 4.8), OpenAI (GPT-5.5), Google (Gemini 3.1 Pro), and xAI (Grok 4.3) - using one identical prompt so the scores compare. The published judge score is leave-one-family-out: a model is never scored by a judge of its own family, so same-family self-preference is removed.

The brief

Draw a Gantt chart in SVG using a 260x160 viewBox. Write it to `chart.svg`. There are four task bars, each a `<rect class="task">` with the listed id, stacked in separate rows. Each bar's horizontal start (its left edge x) is proportional to the task's start week, and each bar's width is proportional to the task's duration in weeks. Tasks (id : label : start week : duration weeks): - t1 : Design : start 0 : duration 3 - t2 : Build : start 1 : duration 2 - t3 : Test : start 2 : duration 4 - t4 : Ship : start 4 : duration 2 So the start-week values map linearly onto the bars' left-edge x positions, and the width ratios match the duration ratios (the Test bar is the widest). Requirements: - Exactly four task bars (class="task"), in id order t1, t2, t3, t4 (document order). - Put each task label (Design, Build, Test, Ship) in a `<text>` element. - Vector primitives only: no raster images, no data: URIs, no base64, no <foreignObject>, no <script>, no external references.

claude-opus-4-8

Low reasoning

claude-opus-4-8 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-opus-4-8

Medium reasoning

claude-opus-4-8 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-opus-4-8

High reasoning

claude-opus-4-8 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-opus-4-8

Extra-high reasoning

claude-opus-4-8 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-opus-4-8

Max reasoning

claude-opus-4-8 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-sonnet-4-6

High reasoning

claude-sonnet-4-6 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-sonnet-5

High reasoning

claude-sonnet-5 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-fable-5

High reasoning

claude-fable-5 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-haiku-4-5

High reasoning

claude-haiku-4-5 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

glm-5.2

default reasoning

glm-5.2 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

kimi-k2.7-code

default reasoning

kimi-k2.7-code rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

gpt-5.5

High reasoning

gpt-5.5 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

gpt-5.4-mini

High reasoning

gpt-5.4-mini rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

gemini-3.1-pro-preview

High reasoning

gemini-3.1-pro-preview rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

gemini-3.5-flash

default reasoning

gemini-3.5-flash rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

gemini-3.1-flash-lite

default reasoning

gemini-3.1-flash-lite rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

grok-4.3

default reasoning

grok-4.3 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

grok-4.20-reasoning

default reasoning

grok-4.20-reasoning rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

grok-build-0.1

default reasoning

grok-build-0.1 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

grok-composer-2.5-fast

default reasoning

grok-composer-2.5-fast rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-opus-4-8

High reasoning

claude-opus-4-8 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-sonnet-4-6

High reasoning

claude-sonnet-4-6 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-sonnet-5

High reasoning

claude-sonnet-5 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-fable-5

High reasoning

claude-fable-5 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-haiku-4-5

default reasoning

claude-haiku-4-5 rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

deepseek-v4-pro

default reasoning

deepseek-v4-pro rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

deepseek-v4-flash

default reasoning

deepseek-v4-flash rendering of the Gantt chart with proportional start offsets and durations benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run