Banja Lab / Benchmarks / Test
The same task, run on 27 models. Compare the outputs side by side, or open any one in a popup to inspect it.
Top result: claude-opus-4-8 (low reasoning) at 100.0% composite. Lowest: claude-haiku-4-5 at 0.0%. 27 models compared on this task.
Build a single self-contained HTML file (`index.html`) that renders an avatar with a circular notification badge pinned to its top-right, with no build step and no network calls (inline all CSS, no external fonts or scripts). The grader measures the rendered badge position, size, and computed z-index, so the offset and stacking must be exact. Avatar `<div id="avatar">`: - `position: relative`, 72px wide, 72px tall, border-radius 36px (a circle), background #cbd5e1. Badge `<span id="badge">` inside the avatar: - `position: absolute`, pinned so it measures at left 52px and top 0px relative to the page (the avatar sits flush at the top-left). - 24px wide, 24px tall, border-radius 12px (a circle). - background #dc2626, colour #ffffff, font-size 12px. - z-index 2 so it sits above the avatar. - text: 3 The left 52px offset, the 24px badge, the #dc2626 fill, and the z-index 2 are the contract.