AI article
LLM-as-a-Judge: I Built One From Scratch, Then Checked It Against Humans
Part 2 of an eval series. A 15-line LLM judge, scored against real Chatbot Arena human votes. It agreed with people on just 43% of pairs, tied a third of the...
Dev.to | Jun 29, 2026 | Suman Nath