gpt-5.5-pro (OpenAI) scored 94.1% composite across 37 tasks - code, UI, full websites, SVG, marketing pages, dashboards, animations, and Australian legal and accounting. Graded by execution, and the visual builds by a cross-family vision panel (leave-one-family-out). Run on 2026-06-23.
Composite score per domain, weakest first. Judge is the vision model’s read, shown for the visual domains.
The actual rendered output. Open any tile to view it in a popup, or compare the same task across every model.
Programming, Australian legal and accounting, graded by execution. 15 of 17 scored a perfect 100.0%; the rest are below. Open the answer in a popup, or compare it across every model.