AI article
A Better LLM Judge? The Rubric Made My Small Model Worse
Part 3 of an eval series. I tried to fix a 43%-agreement LLM judge two ways — a bigger model (DeepSeek & Qwen via OpenRouter) and a real anchored rubric — in...
Dev.to | Jun 29, 2026 | Suman Nath