The most popular way to run local LLMs is quietly hiding where its technology actually comes from.
Ollama became the default for running models on your own hardware with a clean pitch: one command, any model. What it never disclosed was the engine underneath. Every line of inference came from llama.cpp, Georgi Gerganov's C project that made running LLaMA on consumer hardware possible. For over a year, the README contained no mention of it. No credit. No license notice. The MIT license requires exactly one thing: include the copyright notice. Ollama didn't.
Community issues went unanswered for 400 days. When credit was finally added, it was a single line at the bottom of the README.
Then Ollama forked away from llama.cpp in mid-2025, building a custom backend on ggml directly. Community benchmarks show llama.cpp running 1.8x faster on identical hardware. Structured output broke. Vision models failed. GPT-OSS 20B lacked tensor type support the model required. Gerganov himself identified regressions Ollama introduced into GGML.
DeepSeek-R1 naming shows the same pattern. Ollama listed distilled models simply as "DeepSeek-R1," obscuring that an 8B Qwen-derived distillate is nothing like the full 671B model. GitHub issues requesting separation went nowhere.
In July 2025, Ollama shipped a closed-source GUI app with no public license. Downloads routed to an unlicensed binary sitting next to a GitHub link implying MIT licensing. Then came the cloud pivot: proprietary models started appearing alongside the local library, routing prompts to third-party providers. CVE-2025-51471 lets malicious servers exfiltrate auth tokens during normal pulls.
The incentive structure explains everything. Y Combinator backed, VC dependent, the playbook runs familiar: wrap open source, build a user base, raise money, pivot to proprietary. The hashed model registry that doesn't work outside Ollama is not an accident.
llama.cpp now has 100,000 GitHub stars and joined Hugging Face in February 2026. It runs faster, supports more quantizations, and reads embedded chat templates without translation. LM Studio, Jan, and koboldcpp expose it with a GUI. None require copying a 30GB model to change the temperature.
The local LLM ecosystem doesn't need a middleman that obscures its origins, trails its performance, and pivots toward cloud services when VC pressure mounts. It needs llama.cpp. Everything else is packaging, and better packaging already exists.