llama.cpp

An efficient open-source C/C++ runtime for running LLMs on consumer hardware, including CPUs and GPUs. It pioneered practical quantization for local inference and is the engine behind tools like Ollama. If you’re running models locally on a Mac, mini PC, or budget GPU, llama.cpp (or a tool built on it) is almost certainly what’s doing the work.