Banja Lab / Benchmarks / Test
The same task, run on 27 models. Compare the outputs side by side, or open any one in a popup to inspect it.
Top result: grok-composer-2.5-fast (default reasoning) at 96.4% composite. Lowest: deepseek-v4-flash at 0.0%. 27 models compared on this task.
You are given a reference screenshot of a product feature section. Reproduce it as faithfully as you can as ONE self-contained HTML file (`index.html`) that renders with no build step and no network calls (inline all CSS, no external fonts, scripts, or images). Match what the screenshot shows: - a dark, near-black page background, - a two-column section: a copy column on the left and a visual panel on the right, - the left column: a green "SMART TRIAGE" eyebrow, a large headline "An inbox that triages itself", a muted sub-heading, a three-item feature list (each with a green check tile and a title plus description: Priority sorting, Draft replies, Quiet hours), and two buttons below (a solid green "Start free" and an outlined "See how it works"), - the right column: a tall rounded panel filled with a green gradient, containing three floating preview cards stacked vertically (Acme contract - needs sign-off; Weekly digest ready; Draft reply suggested), - a green accent colour (around #22c79a) used for the eyebrow, the check tiles, and the primary button. Keep the dark palette, the two-column split, the green gradient visual, and the relative sizing close to the screenshot. On a narrow window the two columns should stack rather than overflow.