Banja Lab / Benchmarks / Test
The same task, run on 28 models. Compare the outputs side by side, or open any one in a popup to inspect it.
Top result: claude-opus-4-8 (high reasoning) at 100.0% composite. Lowest: claude-haiku-4-5 at 0.0%. 28 models compared on this task.
Implement a Python function `eval_expr(s)` that evaluates an arithmetic expression given as a string and returns its numeric value. The expression language supports: - the binary operators `+`, `-`, `*`, `/` with standard precedence (`*` and `/` bind tighter than `+` and `-`) and left associativity, - parentheses `(` and `)` to override precedence, - unary `+` and `-` (for example `-5` or `3*-2`), - integer and decimal number literals (for example `7` or `1.5`), - arbitrary surrounding and internal whitespace, which is ignored. Semantics: - `/` is true division (so `7/2` is 3.5). - If the result is an exact integer value, return it as an `int` (so `6/2` returns `3`, not `3.0`); otherwise return a `float`. - Raise `ValueError` for a malformed expression (empty input, a dangling operator, mismatched parentheses, two adjacent numbers, an unknown token) and for division by zero. Do not use `eval`, `exec`, or any expression-parsing library: write the parser yourself. Use only the Python standard library. Write your solution to `solution.py`.