mlx-optiq
Engineering · research · releases

Blog

Release notes, methodology dives, and benchmarking deep-dives. New posts land alongside major releases and research findings.

2026·04·25
Gemma-4 lands on mlx-optiq — four sizes, +32 pp on the small one
Adding Google's full Gemma-4 instruct lineup — e2b, e4b, 26B-A4B sparse MoE, and 31B dense. The +32-point GSM8K recovery on gemma-4-e4b is the cleanest mixed-precision win we have. Plus the shared-KV caveat that means you'll want Qwen for quantized-KV serving.
engineering
2026·04·17
TurboQuant — rotated-space KV attention preserves inner products
Affine quantization preserves magnitudes but distorts the inner products that attention actually uses. Rotation-based vector quantization — plus a Metal kernel that attends in rotated space — closes the gap. 100 % needle retrieval at 4-bit vs 73 % for affine, at the same speed.
research
2026·04·08
Sensitivity-aware LoRA — fine-tuning that respects the bit budget
The same per-layer signal that drives mixed-precision quantization also drives adapter rank. 8-bit-quantized layers get 2× the adapter rank of 4-bit-quantized ones at the same parameter budget — and validation loss drops 12 % in head-to-head A/Bs. Plus the empirical training-ceiling map for a 36 GB Mac across all 10 supported models.
engineering
2026·03·20
Not All Layers Are Equal — mixed-precision quantization for weights and KV cache on Apple Silicon
The research foundation behind mlx-optiq. Some layers are 56× more sensitive than others. The KV cache becomes the dominant memory cost at long contexts. Mixed-precision recovers what uniform 4-bit drops; mixed-precision KV fixes the perplexity collapse uniform 4-bit causes.
research