Reference · CLI

CLI reference

The optiq CLI is built with Click. Six top-level commands cover the entire mlx-optiq workflow:

optiq convert — quantize an HF model with mixed-precision sensitivity
optiq kv-cache — measure per-layer KV sensitivity, write a serving config
optiq lora train / info — sensitivity-aware LoRA training
optiq serve — OpenAI-compatible inference server
optiq eval — GSM8K evaluation harness
optiq latency — Apple Silicon latency model

`optiq convert`

Quantize a Hugging Face model with mixed-precision sensitivity-driven bit allocation.

convert.shbash

$ optiq convert MODEL [OPTIONS]

Required

MODEL — HF repo ID (e.g. Qwen/Qwen3.5-9B) or local path to a bf16 checkpoint.

Options

--target-bpw FLOAT — Average bits per weight. Default: 4.5.
--candidate-bits TEXT — Comma-separated bit-widths to choose from. Default: 4,8.
--reference [auto|bf16|uniform_4bit] — Reference precision for sensitivity probes. Default: auto (bf16 if it fits in RAM, else uniform_4bit).
--group-size INTEGER — Quantization group size. Default: 64.
--n-calibration INTEGER — Calibration sequences for sensitivity. Default: 8.
--skip-baselines — Don't also build the uniform-4-bit comparison artifact.
-o, --output TEXT — Output directory. Default: optiq_output/<model_basename>.

Examples

terminalbash

# Standard 4.5 BPW mix on a 9B (auto-routes to bf16 reference)
$ optiq convert Qwen/Qwen3.5-9B --target-bpw 4.5

# 27B+ on a 36 GB Mac (auto-routes to uniform_4bit reference)
$ optiq convert Qwen/Qwen3.5-27B --reference uniform_4bit

# Custom bit set (3-bit / 6-bit mix at 4 BPW average)
$ optiq convert Qwen/Qwen3.5-4B \
    --target-bpw 4.0 --candidate-bits 3,4,6,8

`optiq kv-cache`

Measure per-layer KV-cache sensitivity and write a per-layer KV bit-width config that optiq serve --kv-config consumes.

kv-cache.shbash

$ optiq kv-cache MODEL [OPTIONS]

Options

--target-bits FLOAT — Average KV bits across full-attention layers. Default: 4.5.
--candidate-bits TEXT — Default: 4,8.
--n-calibration INTEGER — Default: 16.
-o, --output TEXT — Where to write kv_config.json.

`optiq lora`

Train and inspect sensitivity-aware LoRA adapters.

`optiq lora train`

lora-train.shbash

$ optiq lora train MODEL [OPTIONS]

--data PATH — Directory containing train.jsonl (and optional valid.jsonl).
--rank INTEGER — Base LoRA rank. Default: 8.
--rank-scaling [constant|by_bits|by_kl|by_quantile] — How to scale rank across layers. Default: by_bits.
--num-layers INTEGER — How many of the model's last layers to train. Default: 16.
--max-seq-length INTEGER — Tokens per training sample. Default: 1024.
--iters INTEGER — Training iterations. Default: 1000.
--learning-rate FLOAT — Default: 1e-4.
--batch-size INTEGER — Default: 1 (Mac UMA constraint).
--target-modules TEXT — Comma-separated. Default: q_proj,v_proj.
-o, --output PATH — Output adapter directory.

`optiq lora info`

lora-info.shbash

$ optiq lora info ADAPTER_DIR

Prints the per-layer rank distribution and trainable parameter count.

`optiq serve`

OpenAI-compatible chat completions server.

serve.shbash

$ optiq serve [OPTIONS]

--model TEXT — HF repo or local model path. Required.
--kv-config PATH — JSON config from optiq kv-cache. Optional; omit for fp16 KV.
--adapter PATH-OR-REPO — Mount a LoRA adapter. Repeatable for multiple adapters.
--host TEXT — Default: 127.0.0.1.
--port INTEGER — Default: 8080.
--max-tokens INTEGER — Default: 4096.
--temp FLOAT — Default: 0.7.
--top-p FLOAT — Default: 1.0.
--top-k INTEGER — Default: 0 (off).

`optiq eval`

GSM8K math-reasoning evaluation harness.

eval.shbash

$ optiq eval MODEL_PATH [OPTIONS]
    --task gsm8k
    --baseline UNIFORM_PATH
    --n-samples 200

`optiq latency`

Apple Silicon roofline latency model. Calibrate against a measured model and predict for others.

latency.shbash

$ optiq latency MODEL_PATH --calibrate

Top-level options

optiq --version — Print the installed version.
optiq --help — Show top-level help.
optiq COMMAND --help — Per-command help.

Environment variables

HF_HOME — Hugging Face cache root. Default: ~/.cache/huggingface.
HF_HUB_ENABLE_HF_TRANSFER=1 — Enable accelerated HF downloads (requires pip install hf_transfer).
OPTIQ_CALIBRATION_SAMPLES — Override calibration sample count globally.