Llama on M1
February 26, 2026
On an M1 Mac, install the following dependencies:
xcode-select --install
brew install cmake
Then download and build llama.cpp. The whole process takes less than one minute and uses about
700 MB of disk.
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DGGML_METAL=ON
cmake --build build -j
You can use a GGUF model to try this out. I’m using Qwen3-8B-GGUF, which is around 6 GB. Then run:
./build/bin/llama-server -m Qwen3-8B-Q5_K_M.gguf --port 8080
This will open a web interface similar to ChatGPT.