Banja Lab / Benchmarks / Test
The same task, run on 27 models. Compare the outputs side by side, or open any one in a popup to inspect it.
Top result: claude-opus-4-8 (extra-high reasoning) at 100.0% composite. Lowest: deepseek-v4-flash at 0.0%. 27 models compared on this task.
Build a single self-contained HTML file (`index.html`) that renders with no build step and no network calls (inline all CSS and JS; no external fonts, scripts, or images). Build a gallery with an <h1> heading and exactly four tiles in a grid. Give the grid container id="gallery" and give the four tiles, in order, id="tile-1", id="tile-2", id="tile-3", and id="tile-4". The grid must widen with the viewport: - on a phone (around 360px wide) the four tiles stack in a single column (each tile starts at the same left edge and sits below the one before it), - on a tablet (around 768px wide) the tiles form more than one column, so at least the first two tiles sit side by side on the first row, and - on a desktop (around 1440px wide) all four tiles sit side by side on one row. The page must not scroll horizontally at any of 360px, 768px, or 1440px wide. Do not pin the tiles to fixed pixel widths that overflow a phone. Use plain, accessible markup.