AI article
I built an Ollama alternative with TurboQuant, model groups, and multi-GPU support
EIE is a policy-driven multi-model GGUF inference server. TurboQuant-native KV cache, CUDA + ROCm, parallel/sequential model groups, Apache 2.0.
Dev.to | Apr 8, 2026 | deharoalexandre-cyber