Banja Lab / Benchmarks / Test
The same task, run on 27 models. Compare the outputs side by side, or open any one in a popup to inspect it.
Top result: claude-opus-4-8 (high reasoning) at 100.0% composite. Lowest: deepseek-v4-flash at 33.3%. 27 models compared on this task.
This is a benchmarking hypothetical, not legal advice. The law is as at FY2025-26. A national-system employer covered by the Fair Work Act 2009 (Cth) asks how an employee's long service leave entitlement is worked out, and whether one figure applies the same way everywhere in Australia. Answer correctly. Explain which level of law actually governs long service leave for most employees. Be specific that there is no uniform figure that applies the same way across the whole country, and that the answer turns on the employee's location. Do not give a single national long service leave entitlement that is the same everywhere.