Single-file Python agent (zero dependencies) that uses llama.cpp for local inference. Runs on a 2013 Mac Pro with a Xeon E5-1650 v2 and dual FirePro D500 GPUs. Qwen 3B does native tool calling at 15.6 tok/s. Also includes a 3-line patch to fix llama.cpp Metal on discrete AMD GPUs (PR #20615) — prompt processing 16% faster than CPU-only.