gpt-5.5-pro

Name: gpt-5.5-pro (high reasoning) - Banja Lab benchmark
Creator: Banja

High reasoningAPI single-shot

gpt-5.5-pro (OpenAI) scored 94.1% composite across 37 tasks - code, UI, full websites, SVG, marketing pages, dashboards, animations, and Australian legal and accounting. Graded by execution, and the visual builds by a cross-family vision panel (leave-one-family-out). Run on 2026-06-23.

Composite

94.1%

95% CI 89.9% - 97.4%

Est. cost

$55.6394

USD

Tokens

317,658

generation

Wall clock

6 min

end to end

Gate pass

validated

Separation

good vs bad

By domain

Composite score per domain, weakest first. Judge is the vision model’s read, shown for the visual domains.

svg-scene

1 task

58.3%

judge

58.3%

composite

Animation

2 tasks

64.6%

judge

64.6%

composite

Marketing page

1 task

77.1%

judge

77.1%

composite

Auth screen

1 task

83.3%

judge

83.3%

composite

Dashboard

1 task

85.4%

judge

85.4%

composite

Australian law

5 tasks

95.0%

composite

UI components

5 tasks

97.6%

composite

Australian accounting

6 tasks

98.3%

composite

Websites

3 tasks

98.7%

composite

Programming

6 tasks

100.0%

composite

SVG and graphics

6 tasks

100.0%

composite

Outputs

The actual rendered output. Open any tile to view it in a popup, or compare the same task across every model.

Open

Compare modelsANIM-0001judge 3.1/5

gpt-5.5-pro Animation output - Complex animation scene

Open

Compare modelsANIM-0002judge 4.1/5

gpt-5.5-pro Auth screen output - Register / login screen

Open

Compare modelsAUTH-0001judge 4.3/5

gpt-5.5-pro Dashboard output - SaaS analytics dashboard

Open

Compare modelsDASH-0001judge 4.4/5

gpt-5.5-pro Marketing page output - Marketing landing page

Open

Compare modelsMKT-0001judge 4.1/5

gpt-5.5-pro svg-scene output - Sydney Harbour at golden hour (SVG scene)

Open

Compare modelsSCENE-0001judge 3.3/5

gpt-5.5-pro SVG and graphics output - Single-concept flat icon (coffee cup with steam)

Open

Compare modelsSVG-0001100.0%

gpt-5.5-pro SVG and graphics output - Simple house icon (square, triangle roof, door)

Open

Compare modelsSVG-0002100.0%

gpt-5.5-pro SVG and graphics output - Flat multi-object scene (sun, hills, hot-air balloon)

Open

Compare modelsSVG-0003100.0%

Open

Compare modelsSVG-0004100.0%

gpt-5.5-pro SVG and graphics output - Tiny bar chart for values [3, 7, 5, 9]

Open

Compare modelsSVG-0005100.0%

gpt-5.5-pro SVG and graphics output - Traffic light with only the green light lit

Open

Compare modelsSVG-0006100.0%

gpt-5.5-pro UI components output - Accessible pricing card with a monthly/annual toggle

Open

Compare modelsUI-000188.1%

gpt-5.5-pro UI components output - Accessible accordion FAQ that expands on click

Open

Compare modelsUI-0002100.0%

gpt-5.5-pro UI components output - Tabbed interface that switches panels on click

Open

Compare modelsUI-0003100.0%

Open

Compare modelsUI-000499.8%

gpt-5.5-pro UI components output - Stat and testimonial card grid

Open

Compare modelsUI-0005100.0%

gpt-5.5-pro Websites output - Landing page with a working mobile nav toggle

Open

Compare modelsWEB-0001100.0%

gpt-5.5-pro Websites output - Dashboard with a sortable data table

Open

Compare modelsWEB-0002100.0%

gpt-5.5-pro Websites output - Pricing page with a working billing toggle and FAQ accordion

Open

Compare modelsWEB-000396.2%

Objective tasks

Programming, Australian legal and accounting, graded by execution. 15 of 17 scored a perfect 100.0%; the rest are below. Open the answer in a popup, or compare it across every model.

Task	Domain	Difficulty	Objective	pass@1	Output
ACC-0003	Australian accounting	easy	90.0%	100.0%	Compare Open
LAW-0004	Australian law	hard	75.0%	100.0%	Compare Open

By domain

Composite score per domain, weakest first. Judge is the vision model’s read, shown for the visual domains.

svg-scene

1 task

58.3%

judge

58.3%

composite

Animation

2 tasks

64.6%

judge

64.6%

composite

Marketing page

1 task

77.1%

judge

77.1%

composite

Auth screen

1 task

83.3%

judge

83.3%

composite

Dashboard

1 task

85.4%

judge

85.4%

composite

Australian law

5 tasks

95.0%

composite

UI components

5 tasks

97.6%

composite

Australian accounting

6 tasks

98.3%

composite

Websites

3 tasks

98.7%

composite

Programming

6 tasks

100.0%

composite

SVG and graphics

6 tasks

100.0%

composite

Objective tasks

Programming, Australian legal and accounting, graded by execution. 15 of 17 scored a perfect 100.0%; the rest are below. Open the answer in a popup, or compare it across every model.

Task	Domain	Difficulty	Objective	pass@1	Output
ACC-0003	Australian accounting	easy	90.0%	100.0%	Compare Open
LAW-0004	Australian law	hard	75.0%	100.0%	Compare Open