mlx-optiq
Reference · CLI

CLI reference

The optiq CLI is built with Click. Six top-level commands cover the entire mlx-optiq workflow:

optiq convert

Quantize a Hugging Face model with mixed-precision sensitivity-driven bit allocation.

convert.shbash
$ optiq convert MODEL [OPTIONS]

Required

  • MODEL — HF repo ID (e.g. Qwen/Qwen3.5-9B) or local path to a bf16 checkpoint.

Options

  • --target-bpw FLOAT — Average bits per weight. Default: 4.5.
  • --candidate-bits TEXT — Comma-separated bit-widths to choose from. Default: 4,8.
  • --reference [auto|bf16|uniform_4bit] — Reference precision for sensitivity probes. Default: auto (bf16 if it fits in RAM, else uniform_4bit).
  • --group-size INTEGER — Quantization group size. Default: 64.
  • --n-calibration INTEGER — Calibration sequences for sensitivity. Default: 8.
  • --skip-baselines — Don't also build the uniform-4-bit comparison artifact.
  • -o, --output TEXT — Output directory. Default: optiq_output/<model_basename>.

Examples

terminalbash
# Standard 4.5 BPW mix on a 9B (auto-routes to bf16 reference)
$ optiq convert Qwen/Qwen3.5-9B --target-bpw 4.5

# 27B+ on a 36 GB Mac (auto-routes to uniform_4bit reference)
$ optiq convert Qwen/Qwen3.5-27B --reference uniform_4bit

# Custom bit set (3-bit / 6-bit mix at 4 BPW average)
$ optiq convert Qwen/Qwen3.5-4B \
    --target-bpw 4.0 --candidate-bits 3,4,6,8

optiq kv-cache

Measure per-layer KV-cache sensitivity and write a per-layer KV bit-width config that optiq serve --kv-config consumes.

kv-cache.shbash
$ optiq kv-cache MODEL [OPTIONS]

Options

  • --target-bits FLOAT — Average KV bits across full-attention layers. Default: 4.5.
  • --candidate-bits TEXT — Default: 4,8.
  • --n-calibration INTEGER — Default: 16.
  • -o, --output TEXT — Where to write kv_config.json.

optiq lora

Train and inspect sensitivity-aware LoRA adapters.

optiq lora train

lora-train.shbash
$ optiq lora train MODEL [OPTIONS]
  • --data PATH — Directory containing train.jsonl (and optional valid.jsonl).
  • --rank INTEGER — Base LoRA rank. Default: 8.
  • --rank-scaling [constant|by_bits|by_kl|by_quantile] — How to scale rank across layers. Default: by_bits.
  • --num-layers INTEGER — How many of the model's last layers to train. Default: 16.
  • --max-seq-length INTEGER — Tokens per training sample. Default: 1024.
  • --iters INTEGER — Training iterations. Default: 1000.
  • --learning-rate FLOAT — Default: 1e-4.
  • --batch-size INTEGER — Default: 1 (Mac UMA constraint).
  • --target-modules TEXT — Comma-separated. Default: q_proj,v_proj.
  • -o, --output PATH — Output adapter directory.

optiq lora info

lora-info.shbash
$ optiq lora info ADAPTER_DIR

Prints the per-layer rank distribution and trainable parameter count.

optiq serve

OpenAI-compatible chat completions server.

serve.shbash
$ optiq serve [OPTIONS]
  • --model TEXT — HF repo or local model path. Required.
  • --kv-config PATH — JSON config from optiq kv-cache. Optional; omit for fp16 KV.
  • --adapter PATH-OR-REPO — Mount a LoRA adapter. Repeatable for multiple adapters.
  • --host TEXT — Default: 127.0.0.1.
  • --port INTEGER — Default: 8080.
  • --max-tokens INTEGER — Default: 4096.
  • --temp FLOAT — Default: 0.7.
  • --top-p FLOAT — Default: 1.0.
  • --top-k INTEGER — Default: 0 (off).

optiq eval

GSM8K math-reasoning evaluation harness.

eval.shbash
$ optiq eval MODEL_PATH [OPTIONS]
    --task gsm8k
    --baseline UNIFORM_PATH
    --n-samples 200

optiq latency

Apple Silicon roofline latency model. Calibrate against a measured model and predict for others.

latency.shbash
$ optiq latency MODEL_PATH --calibrate

Top-level options

  • optiq --version — Print the installed version.
  • optiq --help — Show top-level help.
  • optiq COMMAND --help — Per-command help.

Environment variables

  • HF_HOME — Hugging Face cache root. Default: ~/.cache/huggingface.
  • HF_HUB_ENABLE_HF_TRANSFER=1 — Enable accelerated HF downloads (requires pip install hf_transfer).
  • OPTIQ_CALIBRATION_SAMPLES — Override calibration sample count globally.