为什么你该停止使用 Ollama

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

已注册用户请登录

原文：
https://sleepingrobots.com/dreams/stop-using-ollama/

“Ollama wrapped that work in a nice CLI, raised VC money on the back of it, spent over a year refusing to credit it, forked it badly, shipped a closed-source app alongside it, and then pivoted the whole thing toward cloud services. At every decision point where they could have been good open-source citizens, they chose the path that made them look more self-sufficient to investors.”

总之就是开源小偷，还尝试锁死用户

原文建议用这些：

llama.cpp is the engine. It has an OpenAI-compatible API server (llama-server), a built-in web UI, full control over context windows and sampling parameters, and consistently better throughput than Ollama. In February 2026, Gerganov’s ggml.ai joined Hugging Face to ensure the long-term sustainability of the project. It’s truly community-driven, MIT-licensed, and under active development with 450+ contributors.

llama-swap handles multi-model orchestration, loading, unloading, and hot-swapping models on demand behind a single API endpoint. Pair it with LiteLLM and you get a unified OpenAI-compatible proxy that routes across multiple backends with proper model aliasing.

LM Studio gives you a GUI if that’s what you want. It uses llama.cpp under the hood, exposes all the knobs, and supports any GGUF model without lock-in. Jan is another open-source desktop app with a clean chat interface and local-first design. Msty offers a polished GUI with multi-model support and built-in RAG. koboldcpp is another option with a web UI and extensive configuration options.

Red Hat’s ramalama is worth a look too, a container-native model runner that explicitly credits its upstream dependencies front and center. Exactly what Ollama should have done from the start.

第 1 条附言 4 天前

还有 ollama 的性能更差

开源

替代

社区

14 条回复 2026-04-22 11:36:19 +08:00