Banja
About
Services
Products
Case Studies
Lab
Contact Us
Let us pitch to you

LET'S BUILD
THE FUTURE.

Start a Project
or
Meet Jett
banja.au

We build digital products for people who move fast.

Explore

•About•Case Studies•Blog•Careers•Contact

Services

•Product Design & Build•AI Agents & Automation•Website & Brand Setup

Products

•Boosta

Contact

helloremovethis@andthisbanja.au
50 Miller St
North Sydney NSW 2060

© 2026 Banja Labs. All rights reserved.

Privacy PolicyTerms of Use

Banja Lab / Benchmarks

OpenAI

gpt-5.5-pro

High reasoningAPI single-shot

gpt-5.5-pro (OpenAI) scored 94.1% composite across 37 tasks - code, UI, full websites, SVG, marketing pages, dashboards, animations, and Australian legal and accounting. Graded by execution, and the visual builds by a cross-family vision panel (leave-one-family-out). Run on 2026-06-23.

Composite
94.1%
95% CI 89.9% - 97.4%
Est. cost
$55.6394
USD
Tokens
317,658
generation
Wall clock
6 min
end to end
Gate pass
-
validated
Separation
-
good vs bad

By domain

Composite score per domain, weakest first. Judge is the vision model’s read, shown for the visual domains.

svg-scene
1 task
58.3%
judge
58.3%
composite
Animation
2 tasks
64.6%
judge
64.6%
composite
Marketing page
1 task
77.1%
judge
77.1%
composite
Auth screen
1 task
83.3%
judge
83.3%
composite
Dashboard
1 task
85.4%
judge
85.4%
composite
Australian law
5 tasks
95.0%
composite
UI components
5 tasks
97.6%
composite
Australian accounting
6 tasks
98.3%
composite
Websites
3 tasks
98.7%
composite
Programming
6 tasks
100.0%
composite
SVG and graphics
6 tasks
100.0%
composite

Outputs

The actual rendered output. Open any tile to view it in a popup, or compare the same task across every model.

gpt-5.5-pro Animation output - Simple UI animation
Open
Compare modelsANIM-0001judge 3.1/5
gpt-5.5-pro Animation output - Complex animation scene
Open
Compare modelsANIM-0002judge 4.1/5
gpt-5.5-pro Auth screen output - Register / login screen
Open
Compare modelsAUTH-0001judge 4.3/5
gpt-5.5-pro Dashboard output - SaaS analytics dashboard
Open
Compare modelsDASH-0001judge 4.4/5
gpt-5.5-pro Marketing page output - Marketing landing page
Open
Compare modelsMKT-0001judge 4.1/5
gpt-5.5-pro svg-scene output - Sydney Harbour at golden hour (SVG scene)
Open
Compare modelsSCENE-0001judge 3.3/5
gpt-5.5-pro SVG and graphics output - Single-concept flat icon (coffee cup with steam)
Open
Compare modelsSVG-0001100.0%
gpt-5.5-pro SVG and graphics output - Simple house icon (square, triangle roof, door)
Open
Compare modelsSVG-0002100.0%
gpt-5.5-pro SVG and graphics output - Flat multi-object scene (sun, hills, hot-air balloon)
Open
Compare modelsSVG-0003100.0%
gpt-5.5-pro SVG and graphics output - Geometric logo mark (overlapping circles, abstract monogram)
Open
Compare modelsSVG-0004100.0%
gpt-5.5-pro SVG and graphics output - Tiny bar chart for values [3, 7, 5, 9]
Open
Compare modelsSVG-0005100.0%
gpt-5.5-pro SVG and graphics output - Traffic light with only the green light lit
Open
Compare modelsSVG-0006100.0%
gpt-5.5-pro UI components output - Accessible pricing card with a monthly/annual toggle
Open
Compare modelsUI-000188.1%
gpt-5.5-pro UI components output - Accessible accordion FAQ that expands on click
Open
Compare modelsUI-0002100.0%
gpt-5.5-pro UI components output - Tabbed interface that switches panels on click
Open
Compare modelsUI-0003100.0%
gpt-5.5-pro UI components output - Signup form with inline email validation
Open
Compare modelsUI-000499.8%
gpt-5.5-pro UI components output - Stat and testimonial card grid
Open
Compare modelsUI-0005100.0%
gpt-5.5-pro Websites output - Landing page with a working mobile nav toggle
Open
Compare modelsWEB-0001100.0%
gpt-5.5-pro Websites output - Dashboard with a sortable data table
Open
Compare modelsWEB-0002100.0%
gpt-5.5-pro Websites output - Pricing page with a working billing toggle and FAQ accordion
Open
Compare modelsWEB-000396.2%

Objective tasks

Programming, Australian legal and accounting, graded by execution. 15 of 17 scored a perfect 100.0%; the rest are below. Open the answer in a popup, or compare it across every model.

TaskDomainDifficultyObjectivepass@1Output
ACC-0003Australian accountingeasy90.0%100.0%
CompareOpen
LAW-0004Australian lawhard75.0%100.0%
CompareOpen