Banja Lab / Benchmarks / Test

WEB-0003Websites · medium

Pricing page with a working billing toggle and FAQ accordion

The same task, run on 28 models. Compare the outputs side by side, or open any one in a popup to inspect it.

Top result: claude-opus-4-8 (low reasoning) at 100.0% composite. Lowest: claude-sonnet-4-6 at 0.0%. 28 models compared on this task.

How it ran

Each model was given the brief below in a fresh, isolated session with no access to our tools, and returned a single self-contained index.html (inline CSS and JS, no external requests, no build step).
The rendered output was scored 1 to 5 on brief fidelity, visual design, craft, and impact by a four-family vision panel - Anthropic (Claude Opus 4.8), OpenAI (GPT-5.5), Google (Gemini 3.1 Pro), and xAI (Grok 4.3) - using one identical prompt so the scores compare. The published judge score is leave-one-family-out: a model is never scored by a judge of its own family, so same-family self-preference is removed.

The brief

Build a single self-contained page as one HTML file (`index.html`) that renders with no build step and no network calls (inline all CSS and JS, no external fonts or scripts). The page combines two interactive parts: - a pricing section with three tiers and a monthly/annual billing toggle implemented as an accessible switch (role="switch" with aria state, id="billing"); clicking the toggle must change every displayed price between its monthly and annual value, and - a FAQ section (id="faq") of at least three question/answer items where each answer is collapsed by default; clicking a question must reveal (make visible) its own answer, and clicking it again may collapse it. The first answer must have id="a1". Use plain, readable, accessible markup.

claude-opus-4-8

Low reasoning

claude-opus-4-8 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-opus-4-8

Extra-high reasoning

claude-opus-4-8 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-sonnet-5

High reasoning

claude-sonnet-5 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-fable-5

High reasoning

claude-fable-5 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-opus-4-8

High reasoning

claude-opus-4-8 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

claude-fable-5

High reasoning

claude-fable-5 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 100.0%

Open

Composite 100.0%Objective 100.0%

Open output Full run

grok-composer-2.5-fast

default reasoning

grok-composer-2.5-fast rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 99.9%

Open

Composite 99.9%Objective 99.9%

Open output Full run

claude-opus-4-8

Medium reasoning

claude-opus-4-8 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 99.6%

Open

Composite 99.6%Objective 99.6%

Open output Full run

claude-opus-4-8

High reasoning

claude-opus-4-8 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 99.3%

Open

Composite 99.3%Objective 99.3%

Open output Full run

claude-opus-4-8

Max reasoning

claude-opus-4-8 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 99.3%

Open

Composite 99.3%Objective 99.3%

Open output Full run

gpt-5.5

High reasoning

gpt-5.5 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 96.2%

Open

Composite 96.2%Objective 96.2%

Open output Full run

gpt-5.5-pro

High reasoning

gpt-5.5-pro rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 96.2%

Open

Composite 96.2%Objective 96.2%

Open output Full run

claude-sonnet-5

High reasoning

claude-sonnet-5 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 96.1%

Open

Composite 96.1%Objective 96.1%

Open output Full run

glm-5.2

default reasoning

glm-5.2 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 95.8%

Open

Composite 95.8%Objective 95.8%

Open output Full run

gemini-3.1-flash-lite

default reasoning

gemini-3.1-flash-lite rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 95.8%

Open

Composite 95.8%Objective 95.8%

Open output Full run

grok-build-0.1

default reasoning

grok-build-0.1 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 95.4%

Open

Composite 95.4%Objective 95.4%

Open output Full run

claude-haiku-4-5

default reasoning

claude-haiku-4-5 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 95.2%

Open

Composite 95.2%Objective 95.2%

Open output Full run

grok-4.20-reasoning

default reasoning

grok-4.20-reasoning rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 60.0%

Open

Composite 60.0%Objective 60.0%

Open output Full run

gpt-5.4-mini

High reasoning

gpt-5.4-mini rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 58.5%

Open

Composite 58.5%Objective 58.5%

Open output Full run

gemini-3.1-pro-preview

High reasoning

gemini-3.1-pro-preview rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 58.3%

Open

Composite 58.3%Objective 58.3%

Open output Full run

claude-sonnet-4-6

High reasoning

claude-sonnet-4-6 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 57.8%

Open

Composite 57.8%Objective 57.8%

Open output Full run

kimi-k2.7-code

default reasoning

kimi-k2.7-code rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 57.6%

Open

Composite 57.6%Objective 57.6%

Open output Full run

claude-haiku-4-5

High reasoning

claude-haiku-4-5 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 57.5%

Open

Composite 57.5%Objective 57.5%

Open output Full run

deepseek-v4-flash

default reasoning

deepseek-v4-flash rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 56.5%

Open

Composite 56.5%Objective 56.5%

Open output Full run

grok-4.3

default reasoning

grok-4.3 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 54.3%

Open

Composite 54.3%Objective 54.3%

Open output Full run

deepseek-v4-pro

default reasoning

deepseek-v4-pro rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 20.6%

Open

Composite 20.6%Objective 20.6%

Open output Full run

gemini-3.5-flash

default reasoning

gemini-3.5-flash rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 16.7%

Open

Composite 16.7%Objective 16.7%

Open output Full run

claude-sonnet-4-6

High reasoning

claude-sonnet-4-6 rendering of the Pricing page with a working billing toggle and FAQ accordion benchmark - composite 0.0%

Open

Composite 0.0%Objective 0.0%

Open output Full run