AI article

Speculative decoding shifted our output distribution and evals missed it

TL;DR: We turned on speculative decoding in vLLM to cut latency on a fine-tuned 8B. Got a 1.9x...

Dev.to | Jun 18, 2026 | Marcus Chen

Read the original article

More AI news