AI article
Passing the Eval Isn't Solving the Task: 3 Leaks, 60 Lines
A 60-line static probe found 3 contamination points in an agent eval harness, exit 1, without running the agent. Passing the eval is not solving the task.
Dev.to | Jun 26, 2026 | Alexey Spinov