claude-haiku-4-5 (Anthropic) scored 58.5% composite across 87 tasks - code, UI, full websites, SVG, marketing pages, dashboards, animations, and Australian legal and accounting. Graded by execution, and the visual builds by a cross-family vision panel (leave-one-family-out). Run on 2026-06-24.
Composite score per domain, weakest first. Judge is the vision model’s read, shown for the visual domains.
The actual rendered output. Open any tile to view it in a popup, or compare the same task across every model.
Programming, Australian legal and accounting, graded by execution. 13 of 34 scored a perfect 100.0%; the rest are below. Open the answer in a popup, or compare it across every model.
| Task | Domain | Difficulty | Objective | pass@1 | Output |
|---|---|---|---|---|---|
| AUSFA-0001 | Australian accounting | easy | 10.0% | 0.0% | |
| AUSFA-0002 | Australian accounting | medium | 15.0% | 0.0% | |
| AUSFA-0003 | Australian accounting | easy | 85.0% | 100.0% | |
| AUSFA-0004 | Australian accounting | medium | 0.0% | 0.0% | |
| AUSFA-0005 | Australian accounting | medium | 0.0% | 0.0% | |
| AUSFA-0006 | Australian accounting | medium | 15.0% | 0.0% | |
| AUSFA-0008 | Australian accounting | hard | 0.0% | 0.0% | |
| AUSFA-0009 | Australian accounting | hard | 15.0% | 0.0% | |
| AUSFA-0011 | Australian accounting | hard | 66.7% | 0.0% | |
| AUSFA-0012 | Australian accounting | hard | 0.0% | 0.0% | |
| AUSFA-0013 | Australian accounting | hard | 33.3% | 0.0% | |
| AUSFA-0014 | Australian accounting | hard | 0.0% | 0.0% | |
| BASQU-0001 | Australian accounting | hard | 0.0% | 0.0% | |
| CODE-0001 | Programming | easy | 0.0% | 0.0% | |
| CODE-0003 | Programming | medium | 0.0% | 0.0% | |
| CODE-0004 | Programming | easy | 0.0% | 0.0% | |
| CODE-0006 | Programming | hard | 0.0% | 0.0% | |
| LAW-0001 | Australian law | medium | 0.0% | 0.0% | |
| LAW-0002 | Australian law | medium | 50.0% | 0.0% | |
| LAW-0003 | Australian law | hard | 0.0% | 0.0% | |
| LAW-0004 | Australian law | hard | 0.0% | 0.0% |