Built EdgeRunner in pure Swift and Metal from scratch over a weekend. On Qwen3-0.6B Q8_0, it hits 212 tok/s on M3 Max, beating llama.cpp by 16%. Here's what worked.
Mar 31, 2026 deep dive 5 min read
Tag
2 posts tagged with Apple Silicon.
Built EdgeRunner in pure Swift and Metal from scratch over a weekend. On Qwen3-0.6B Q8_0, it hits 212 tok/s on M3 Max, beating llama.cpp by 16%. Here's what worked.
We fully documented EdgeRunner, a Metal LLM inference engine for Apple Silicon. Here's what we found that was actually interesting.