Banja
About
Services
Products
Case Studies
Lab
Contact Us
Let us pitch to you

LET'S BUILD
THE FUTURE.

Start a Project
or
Meet Jett
banja.au

We build digital products for people who move fast.

Explore

•About•Case Studies•Blog•Careers•Contact

Services

•Product Design & Build•AI Agents & Automation•Website & Brand Setup

Products

•Boosta

Contact

helloremovethis@andthisbanja.au
50 Miller St
North Sydney NSW 2060

© 2026 Banja Labs. All rights reserved.

Privacy PolicyTerms of Use

Banja Lab / Benchmarks / Test

CODE-0002Programming · medium

Parse a SemVer 2.0.0 version string

The same task, run on 28 models. Compare the outputs side by side, or open any one in a popup to inspect it.

Top result: claude-opus-4-8 (low reasoning) at 100.0% composite. Lowest: claude-opus-4-8 at 0.0%. 28 models compared on this task.

How it ran
  • Each model was given the brief below in a fresh, isolated session with no access to our tools, and returned its answer from scratch.
  • The rendered output was scored 1 to 5 on brief fidelity, visual design, craft, and impact by a four-family vision panel - Anthropic (Claude Opus 4.8), OpenAI (GPT-5.5), Google (Gemini 3.1 Pro), and xAI (Grok 4.3) - using one identical prompt so the scores compare. The published judge score is leave-one-family-out: a model is never scored by a judge of its own family, so same-family self-preference is removed.
The brief

Implement a Python function `parse_semver(s)` that parses a Semantic Versioning 2.0.0 version string and returns a dict, or returns None if the string is not a valid SemVer version. A valid version is `MAJOR.MINOR.PATCH` followed by an optional `-prerelease` and an optional `+build` metadata section, in that order. Rules: - MAJOR, MINOR and PATCH are each a non-negative integer with no leading zeros (so "0" is allowed, "01" is not). - The optional prerelease starts with "-" and is a dot-separated list of one or more identifiers. A numeric identifier must not have leading zeros. Identifiers are made of [0-9A-Za-z-]. - The optional build metadata starts with "+" and is a dot-separated list of one or more identifiers made of [0-9A-Za-z-] (leading zeros allowed here). On success return a dict with keys: - "major", "minor", "patch": the three integers, - "prerelease": the prerelease string without the leading "-", or None, - "build": the build string without the leading "+", or None. Return None for any input that does not match (including non-strings). Use only the Python standard library. Write your solution to `solution.py`.

Anthropicclaude-opus-4-8
Low reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Anthropicclaude-opus-4-8
High reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Anthropicclaude-opus-4-8
Extra-high reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Anthropicclaude-sonnet-4-6
High reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Anthropicclaude-sonnet-5
High reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Anthropicclaude-fable-5
High reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Anthropicclaude-haiku-4-5
High reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Zhipuglm-5.2
default reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Moonshotkimi-k2.7-code
default reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
OpenAIgpt-5.5
High reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
OpenAIgpt-5.5-pro
High reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
OpenAIgpt-5.4-mini
High reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Googlegemini-3.1-pro-preview
High reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Googlegemini-3.5-flash
default reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Googlegemini-3.1-flash-lite
default reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
xAIgrok-4.3
default reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
xAIgrok-4.20-reasoning
default reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
xAIgrok-build-0.1
default reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
xAIgrok-composer-2.5-fast
default reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Anthropicclaude-opus-4-8
High reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Anthropicclaude-sonnet-4-6
High reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Anthropicclaude-sonnet-5
High reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Anthropicclaude-fable-5
High reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Anthropicclaude-haiku-4-5
default reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
DeepSeekdeepseek-v4-pro
default reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
DeepSeekdeepseek-v4-flash
default reasoning
Composite 100.0%Objective 100.0%
Open outputFull run
Anthropicclaude-opus-4-8
Medium reasoning
Composite 0.0%Objective 0.0%
Open outputFull run
Anthropicclaude-opus-4-8
Max reasoning
Composite 0.0%Objective 0.0%
Open outputFull run