Codex
OpenAI's Codex CLI uses the OpenAI Responses API exclusively (Chat Completions was deprecated for Codex in 2026). optiq serve exposes /v1/responses by default so Codex talks to your local OptIQ-quantized model with a one-block config addition.
1. Install Codex
terminalbash
$ npm install -g @openai/codex
2. Start optiq serve
terminalbash
$ optiq serve \ --model mlx-community/Qwen3.5-9B-OptiQ-4bit \ --mtp --mtp-depth 2 \ --port 8080
The default --responses flag installs /v1/responses; --mtp adds in-checkpoint MTP speculation for ~1.4-1.8× decode speedup on Qwen3.5 / 3.6 family.
3. Configure Codex
Edit ~/.codex/config.toml and add:
~/.codex/config.tomltoml
[model_providers.optiq] name = "OptIQ Local" base_url = "http://localhost:8080/v1" env_key = "OPTIQ_AUTH_TOKEN" wire_api = "responses" requires_openai_auth = false [profiles.optiq] model_provider = "optiq" model = "mlx-community/Qwen3.5-9B-OptiQ-4bit"
Then export the auth token and launch Codex with this profile:
terminalbash
export OPTIQ_AUTH_TOKEN=sk-optiq-local $ codex -p optiq
Notes
- wire_api = "responses" is required. Codex no longer accepts
wire_api = "chat". - Tool calls: Codex relies on function-calling for its edit / run / search loop. Models without robust function-calling training won't drive the full agent. Qwen3.5-9B-OptiQ and up handle this well; smaller models work for plain chat only.
- Built-in tools (web_search, file_search, computer_use): silently dropped by our shim. Codex's local-tool stack (apply_patch, shell, etc.) runs in the CLI itself and works fine.
- Streaming: works. Codex relies on
response.output_text.deltaandresponse.completedevents; both are emitted in spec-compliant order. - Verified: tested against Codex v0.130.0 on macOS (Apple Silicon).
Codex + MTP on Apple Silicon
Codex's edit-run-review loop is decode-heavy. The MTP speculation in
--mtp stays at ~70% acceptance on Qwen3.5/3.6 family for typical code-edit prompts. Pairs well with longer max_tokens settings since the MTP head amortizes the per-token cost.