AI article
Speculative decoding shifted our output distribution and evals missed it
TL;DR: We turned on speculative decoding in vLLM to cut latency on a fine-tuned 8B. Got a 1.9x...
Dev.to | Jun 18, 2026 | Marcus Chen
AI article
TL;DR: We turned on speculative decoding in vLLM to cut latency on a fine-tuned 8B. Got a 1.9x...
Dev.to | Jun 18, 2026 | Marcus Chen