Banja Lab / Benchmarks / Test
The same task, run on 28 models. Compare the outputs side by side, or open any one in a popup to inspect it.
Top result: claude-opus-4-8 (low reasoning) at 100.0% composite. Lowest: claude-haiku-4-5 at 0.0%. 28 models compared on this task.
Implement a Python class `LRUCache` (a least-recently-used cache) with these methods, each running in O(1) average time: - `LRUCache(capacity)`: construct an empty cache that holds at most `capacity` keys. `capacity` is a non-negative integer. - `get(key)`: return the value for `key`, or -1 if the key is not present. A successful get marks the key as the most recently used. - `put(key, value)`: insert or update the value for `key`, and mark it as the most recently used. If this makes the cache exceed `capacity`, evict the least recently used key first. Edge cases that matter: - A `get` counts as a use: it must refresh the key's recency. - Updating an existing key with `put` also refreshes its recency and must not grow the cache. - `capacity == 0` stores nothing: every `put` is a no-op and `get` always returns -1. Use only the Python standard library. Write your solution to `solution.py`.