Banja Lab / Benchmarks / Test
The same task, run on 27 models. Compare the outputs side by side, or open any one in a popup to inspect it.
Top result: claude-opus-4-8 (medium reasoning) at 100.0% composite. Lowest: deepseek-v4-pro at 0.0%. 27 models compared on this task.
Build a single self-contained page as one HTML file (`index.html`) that renders with no build step and no network calls (inline all CSS and JS, no external fonts or scripts). Build a menu button (the WAI-ARIA "menu button" pattern). Requirements: - A trigger button with id="menu-button", aria-haspopup="menu", and aria-expanded that is "false" while the menu is closed and "true" while it is open. - A popup with role="menu" (id="menu") containing at least four items, each a control with role="menuitem". Give them the ids mi-profile, mi-billing, mi-settings, mi-signout. - With the button focused, ArrowDown (or Enter) opens the menu AND moves keyboard focus to the first menu item. While the menu is open, ArrowDown moves focus to the next item and ArrowUp to the previous (real DOM focus moves between the menu items). - Escape closes the menu and returns keyboard focus to the trigger button. The menu must be fully operable with the keyboard alone. Use plain, accessible markup.