mlx-optiq
Workflow · integrations

Integrations

The same optiq serve process exposes three API protocols on one port. Most coding agents and IDE plugins speak one of them, so you can run any of these tools against your local OptIQ-quantized model with a tiny config change.

Endpoints

optiq serve --model <path> --port 8080 exposes:

  • /v1/chat/completions  — OpenAI Chat Completions (default, used by most tools)
  • /v1/messages  — Anthropic Messages (set ANTHROPIC_BASE_URL)
  • /v1/responses  — OpenAI Responses (required by Codex; used by Cursor, Continue, Cline)

All three endpoints accept Bearer tokens that start with sk-optiq-. The suffix is anything you want; the prefix is checked. Authorization header may be omitted for local-dev curl calls.

Coverage matrix

ToolAPI protocolOptIQ endpointVerified version
Claude CodeAnthropic Messages/v1/messages2.1.143
CodexOpenAI Responses/v1/responses0.130.0
OpenCodeOpenAI Chat Completions/v1/chat/completions1.15.4
OpenClawAnthropic Messages/v1/messages2026.5.12
Hermes AgentOpenAI Chat Completions/v1/chat/completions0.14.0
CursorOpenAI Responses/v1/responsessame config as Codex

"Verified version" is the build of each agent we ran the full wire test against on macOS (Apple Silicon). Newer versions should keep working; if you hit a regression please file an issue.

Quickstart (any tool)

terminalbash
# 1. Start the server (any OptIQ-quantized model)
$ optiq serve --model mlx-community/Qwen3.5-9B-OptiQ-4bit --port 8080

# 2. Hand the tool these settings:
#    Base URL:  http://localhost:8080/v1
#    API key:   sk-optiq-local  (any string prefixed sk-optiq-)
#    Protocol:  see "Coverage matrix" above for which endpoint the tool uses

MTP-aware serving

Add --mtp to enable in-checkpoint MTP speculative decoding for ~1.4-1.8× decode tok/s on Qwen3.5 / 3.6 family. Works transparently for all three endpoints; tools don't need to know about it.

terminalbash
$ optiq serve --model mlx-community/Qwen3.5-9B-OptiQ-4bit \
    --mtp --mtp-depth 2 --port 8080
Why one server, three protocols Tools have fragmented across three competing API standards. Rather than make you pick one (or run three servers), optiq serve speaks all three from the same process. Internally everything funnels into the same generation loop — the Responses and Anthropic endpoints translate to/from OpenAI Chat Completions and reuse the existing handler. MTP and KV-quant apply to all three transparently.

What if my tool isn't listed?

If the tool can be pointed at a custom OpenAI-compatible base_url, it works with mlx-optiq out of the box. The matrix above just covers the tools we've tested end-to-end. Common candidates that work but aren't documented here yet:

  • aider — set --openai-api-base http://localhost:8080/v1
  • Open WebUI — add as an OpenAI-compatible connection in settings
  • LangChain / LlamaIndex / DSPy — set the OpenAI client's base_url
  • Anything using the openai Python SDK — instantiate with OpenAI(base_url=..., api_key="sk-optiq-...")