AI article

A Better LLM Judge? The Rubric Made My Small Model Worse

Part 3 of an eval series. I tried to fix a 43%-agreement LLM judge two ways — a bigger model (DeepSeek & Qwen via OpenRouter) and a real anchored rubric — in...

Dev.to | Jun 29, 2026 | Suman Nath

Read the original article

More AI news