Llama on M1

February 26, 2026

On an M1 Mac, install the following dependencies:

xcode-select --install
brew install cmake

Then download and build llama.cpp. The whole process takes less than one minute and uses about 700 MB of disk.

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DGGML_METAL=ON
cmake --build build -j

You can use a GGUF model to try this out. I’m using Qwen3-8B-GGUF, which is around 6 GB. Then run:

./build/bin/llama-server -m Qwen3-8B-Q5_K_M.gguf --port 8080

This will open a web interface similar to ChatGPT.