Banja Lab / Benchmarks / Test
The same task, run on 27 models. Compare the outputs side by side, or open any one in a popup to inspect it.
Top result: claude-opus-4-8 (low reasoning) at 100.0% composite. Lowest: deepseek-v4-flash at 100.0%. 27 models compared on this task.
Build a single self-contained HTML file (`index.html`) that renders a determinate progress bar at 65 percent, with no build step and no network calls (inline all CSS, no external fonts or scripts). The grader measures the rendered geometry, so the fill must be exactly the right width, not roughly two-thirds. Track `<div id="track">`: - width 400px, height 12px, background #e2e8f0, border-radius 6px. Fill `<div id="fill">` inside the track: - width exactly 260px (65 percent of 400px), height 12px, background #6d28d9. Caption `<p id="caption">`: - font-size 13px, colour #475569, text "65% complete". The 260px fill, the 12px track height, and the 13px caption are the contract.