millrace

Local-first LLM inference on Apple Silicon.

OpenAI- and Anthropic-compatible APIs, written from scratch in Mojo — every GPU kernel custom-written, no C++, no CUDA, no Metal shaders.

The whole stack is Mojo: the inference engine, the privacy harness, the document vault, and the small libraries underneath. Even the program a frontier model writes to answer questions about your private files is Mojo it generates — compiled and sandboxed on your Mac, so your data never leaves the machine.

Get started Deep dives View on GitHub

the server engine

From-scratch, pure-Mojo GPU inference engine for Qwen2.5 on Apple Silicon — every kernel custom-written, no C++/CUDA/Metal deps. Speaks an OpenAI-compatible API, so you can code against it locally with opencode.

headgate experimental

Experimental privacy harness: a frontier model writes code that runs on your private data locally, in a sandbox that can't phone home — the data never leaves the machine.

dacular experimental

Experimental personal data vault built on headgate. Index your CSV/PDF/Markdown files and ask open-ended questions — answered locally, with the data never reaching the frontier model. See the walkthrough.

app macOS app

The Millrace macOS app — a menu-bar companion plus a millrace CLI, installable with Homebrew (brew install millrace/tap/millrace). One-click bootstrap of the whole stack: fetch, build, download weights, serve, launch opencode.

tools libraries

The small, single-purpose Mojo libraries we built along the way — chat templating, compression, PDF extraction, an on-device vector store — plus the flare networking stack underneath.