Banja Lab / Benchmarks / Test

LAW-0002Australian law · medium

NES minimum notice and redundancy pay

The same task, run on 28 models. Compare the outputs side by side, or open any one in a popup to inspect it.

Top result: claude-opus-4-8 (low reasoning) at 100.0% composite. Lowest: deepseek-v4-flash at 0.0%. 28 models compared on this task.

How it ran

Each model was given the brief below in a fresh, isolated session with no access to our tools, and returned its answer from scratch.
The rendered output was scored 1 to 5 on brief fidelity, visual design, craft, and impact by a four-family vision panel - Anthropic (Claude Opus 4.8), OpenAI (GPT-5.5), Google (Gemini 3.1 Pro), and xAI (Grok 4.3) - using one identical prompt so the scores compare. The published judge score is leave-one-family-out: a model is never scored by a judge of its own family, so same-family self-preference is removed.

The brief

This is a benchmarking hypothetical, not legal advice. The law is as at FY2025-26 (Commonwealth). A full-time employee of a national-system employer that is not a small business employer has 6 years of continuous service. The employee is under 45 years of age. The position is made genuinely redundant and the employer pays out the notice period rather than having the employee work it. Using the National Employment Standards scales, state in weeks: (a) the minimum period of notice of termination owed, and (b) the amount of redundancy pay owed (in weeks of base pay). Give each figure clearly. The scales are set out in the Fair Work Act 2009 (Cth): the notice scale in s 117 and the redundancy pay scale in s 119. Name the controlling Act and both sections in your answer.

claude-opus-4-8

Low reasoning

Composite 100.0%Objective 100.0%

Open output Full run

claude-opus-4-8

Medium reasoning

Composite 100.0%Objective 100.0%

Open output Full run

claude-opus-4-8

High reasoning

Composite 100.0%Objective 100.0%

Open output Full run

claude-opus-4-8

Extra-high reasoning

Composite 100.0%Objective 100.0%

Open output Full run

claude-opus-4-8

Max reasoning

Composite 100.0%Objective 100.0%

Open output Full run

claude-sonnet-4-6

High reasoning

Composite 100.0%Objective 100.0%

Open output Full run

claude-sonnet-5

High reasoning

Composite 100.0%Objective 100.0%

Open output Full run

claude-fable-5

High reasoning

Composite 100.0%Objective 100.0%

Open output Full run

gpt-5.5

High reasoning

Composite 100.0%Objective 100.0%

Open output Full run

gpt-5.5-pro

High reasoning

Composite 100.0%Objective 100.0%

Open output Full run

gpt-5.4-mini

High reasoning

Composite 100.0%Objective 100.0%

Open output Full run

gemini-3.1-pro-preview

High reasoning

Composite 100.0%Objective 100.0%

Open output Full run

gemini-3.5-flash

default reasoning

Composite 100.0%Objective 100.0%

Open output Full run

gemini-3.1-flash-lite

default reasoning

Composite 100.0%Objective 100.0%

Open output Full run

grok-4.20-reasoning

default reasoning

Composite 100.0%Objective 100.0%

Open output Full run

grok-build-0.1

default reasoning

Composite 100.0%Objective 100.0%

Open output Full run

grok-composer-2.5-fast

default reasoning

Composite 100.0%Objective 100.0%

Open output Full run

claude-opus-4-8

High reasoning

Composite 100.0%Objective 100.0%

Open output Full run

claude-sonnet-4-6

High reasoning

Composite 100.0%Objective 100.0%

Open output Full run

claude-sonnet-5

High reasoning

Composite 100.0%Objective 100.0%

Open output Full run

claude-fable-5

High reasoning

Composite 100.0%Objective 100.0%

Open output Full run

claude-haiku-4-5

High reasoning

Composite 50.0%Objective 50.0%

Open output Full run

glm-5.2

default reasoning

Composite 50.0%Objective 50.0%

Open output Full run

kimi-k2.7-code

default reasoning

Composite 50.0%Objective 50.0%

Open output Full run

claude-haiku-4-5

default reasoning

Composite 50.0%Objective 50.0%

Open output Full run

grok-4.3

default reasoning

Composite 0.0%Objective 0.0%

Open output Full run

deepseek-v4-pro

default reasoning

Composite 0.0%Objective 0.0%

Open output Full run

deepseek-v4-flash

default reasoning

Composite 0.0%Objective 0.0%

Open output Full run