Banja Lab / Benchmarks / Test
The same task, run on 27 models. Compare the outputs side by side, or open any one in a popup to inspect it.
Top result: kimi-k2.7-code (default reasoning) at 100.0% composite. Lowest: claude-opus-4-8 at 33.3%. 27 models compared on this task.
This is a benchmarking hypothetical, not legal advice. The facts are as at FY2025-26. A payroll officer asks: on what single date does the Labour Day public holiday fall in Australia in 2026? Answer correctly. Explain whether one national date exists. Be specific that the answer depends on where the employee works, and that the day is set differently in different parts of the country, so a single confident national date would be wrong. Do not commit to one date that applies everywhere in Australia.