Documentation

Installation

mlx-optiq is a pure-Python package on PyPI. It runs on macOS with Apple Silicon (M1 / M2 / M3 / M4) and Python 3.11+. Linux and Windows are not supported because MLX itself is Apple-only.

One-line install

terminalbash

$ pip install mlx-optiq

That's it. The base install pulls in mlx, mlx-lm, huggingface-hub, click and a handful of small utilities. ~80 MB on disk including all dependencies.

Optional extras

Some workflows need more dependencies. Install them on demand:

terminalbash

# Quantization workflows (psutil for RAM precheck on convert)
$ pip install 'mlx-optiq[convert]'

# Evaluation harnesses (datasets for GSM8K)
$ pip install 'mlx-optiq[eval]'

# Serving (uvicorn, fastapi for the OpenAI-compatible API)
$ pip install 'mlx-optiq[serve]'

# Everything
$ pip install 'mlx-optiq[all]'

Verify the install

terminalbash

$ optiq --version
# mlx-optiq, version 0.1.0

$ python -c "import optiq; print(optiq.__version__)"
# 0.1.0

System requirements

OS: macOS 14 (Sonoma) or newer.
Hardware: Apple Silicon (M1, M2, M3, M4 — any tier).
Python: 3.11 or newer. Use pyenv, uv or conda to manage versions; system Python on macOS is usually too old.
RAM: 16 GB minimum to run small quants (0.8 B–4 B). 24 GB for 9 B comfortably. 36 GB+ for 27 B / 31 B / 35 B-A3B and for fine-tuning.
Disk: Each pre-built quant is 0.5–20 GB. Plan accordingly.

Working in a virtualenv

Strongly recommended. uv is the fastest path:

terminalbash

$ uv venv .venv
$ source .venv/bin/activate
$ uv pip install mlx-optiq

Or stock venv:

terminalbash

$ python3.11 -m venv .venv
$ source .venv/bin/activate
$ pip install mlx-optiq

Upgrade

terminalbash

$ pip install --upgrade mlx-optiq

Already have a quant downloaded? Pre-built quants live in your local Hugging Face cache (~/.cache/huggingface/hub). They're independent of the mlx-optiq version — upgrading the package doesn't re-download anything.

Troubleshooting

"No matching distribution found"

You're probably on Linux, Windows, or Intel macOS. mlx-optiq requires Apple Silicon. There's no fundamental reason it couldn't work on Linux too — but it depends on MLX, which is macOS-only.

Slow first model download

Hugging Face downloads can be slow from some regions. Set HF_HUB_ENABLE_HF_TRANSFER=1 and install hf_transfer for ~5× speedups on large models:

terminalbash

$ pip install hf_transfer
$ export HF_HUB_ENABLE_HF_TRANSFER=1

"Metal command-buffer timeout" while quantizing 27 B+

Long Metal kernels can time out on the macOS GPU watchdog. mlx-optiq patches around this internally for the convert path; if you hit it during fine-tuning, lower --max-seq-length. See the fine-tuning guide's training-ceiling map.

Next: pick a model family — Qwen3.5, Qwen3.6, Gemma-4 — or jump to Using mlx-optiq quants.