CLI reference
The optiq CLI is built with Click. Six top-level commands cover the entire mlx-optiq workflow:
optiq convert— quantize an HF model with mixed-precision sensitivityoptiq kv-cache— measure per-layer KV sensitivity, write a serving configoptiq lora train / info— sensitivity-aware LoRA trainingoptiq serve— OpenAI-compatible inference serveroptiq eval— GSM8K evaluation harnessoptiq latency— Apple Silicon latency model
optiq convert
Quantize a Hugging Face model with mixed-precision sensitivity-driven bit allocation.
convert.shbash
$ optiq convert MODEL [OPTIONS]
Required
MODEL— HF repo ID (e.g.Qwen/Qwen3.5-9B) or local path to a bf16 checkpoint.
Options
--target-bpw FLOAT— Average bits per weight. Default:4.5.--candidate-bits TEXT— Comma-separated bit-widths to choose from. Default:4,8.--reference [auto|bf16|uniform_4bit]— Reference precision for sensitivity probes. Default:auto(bf16 if it fits in RAM, else uniform_4bit).--group-size INTEGER— Quantization group size. Default:64.--n-calibration INTEGER— Calibration sequences for sensitivity. Default:8.--skip-baselines— Don't also build the uniform-4-bit comparison artifact.-o, --output TEXT— Output directory. Default:optiq_output/<model_basename>.
Examples
terminalbash
# Standard 4.5 BPW mix on a 9B (auto-routes to bf16 reference) $ optiq convert Qwen/Qwen3.5-9B --target-bpw 4.5 # 27B+ on a 36 GB Mac (auto-routes to uniform_4bit reference) $ optiq convert Qwen/Qwen3.5-27B --reference uniform_4bit # Custom bit set (3-bit / 6-bit mix at 4 BPW average) $ optiq convert Qwen/Qwen3.5-4B \ --target-bpw 4.0 --candidate-bits 3,4,6,8
optiq kv-cache
Measure per-layer KV-cache sensitivity and write a per-layer KV bit-width config that optiq serve --kv-config consumes.
kv-cache.shbash
$ optiq kv-cache MODEL [OPTIONS]
Options
--target-bits FLOAT— Average KV bits across full-attention layers. Default:4.5.--candidate-bits TEXT— Default:4,8.--n-calibration INTEGER— Default:16.-o, --output TEXT— Where to writekv_config.json.
optiq lora
Train and inspect sensitivity-aware LoRA adapters.
optiq lora train
lora-train.shbash
$ optiq lora train MODEL [OPTIONS]
--data PATH— Directory containingtrain.jsonl(and optionalvalid.jsonl).--rank INTEGER— Base LoRA rank. Default:8.--rank-scaling [constant|by_bits|by_kl|by_quantile]— How to scale rank across layers. Default:by_bits.--num-layers INTEGER— How many of the model's last layers to train. Default:16.--max-seq-length INTEGER— Tokens per training sample. Default:1024.--iters INTEGER— Training iterations. Default:1000.--learning-rate FLOAT— Default:1e-4.--batch-size INTEGER— Default:1(Mac UMA constraint).--target-modules TEXT— Comma-separated. Default:q_proj,v_proj.-o, --output PATH— Output adapter directory.
optiq lora info
lora-info.shbash
$ optiq lora info ADAPTER_DIR
Prints the per-layer rank distribution and trainable parameter count.
optiq serve
OpenAI-compatible chat completions server.
serve.shbash
$ optiq serve [OPTIONS]
--model TEXT— HF repo or local model path. Required.--kv-config PATH— JSON config fromoptiq kv-cache. Optional; omit for fp16 KV.--adapter PATH-OR-REPO— Mount a LoRA adapter. Repeatable for multiple adapters.--host TEXT— Default:127.0.0.1.--port INTEGER— Default:8080.--max-tokens INTEGER— Default:4096.--temp FLOAT— Default:0.7.--top-p FLOAT— Default:1.0.--top-k INTEGER— Default:0(off).
optiq eval
GSM8K math-reasoning evaluation harness.
eval.shbash
$ optiq eval MODEL_PATH [OPTIONS] --task gsm8k --baseline UNIFORM_PATH --n-samples 200
optiq latency
Apple Silicon roofline latency model. Calibrate against a measured model and predict for others.
latency.shbash
$ optiq latency MODEL_PATH --calibrate
Top-level options
optiq --version— Print the installed version.optiq --help— Show top-level help.optiq COMMAND --help— Per-command help.
Environment variables
HF_HOME— Hugging Face cache root. Default:~/.cache/huggingface.HF_HUB_ENABLE_HF_TRANSFER=1— Enable accelerated HF downloads (requirespip install hf_transfer).OPTIQ_CALIBRATION_SAMPLES— Override calibration sample count globally.