Workflow · integrations

Integrations

The same optiq serve process exposes three API protocols on one port. Most coding agents and IDE plugins speak one of them, so you can run any of these tools against your local OptiQ-quantized model with a tiny config change.

Endpoints

optiq serve --model <path> --port 8080 exposes:

/v1/chat/completions — OpenAI Chat Completions (default, used by most tools)
/v1/messages — Anthropic Messages (set ANTHROPIC_BASE_URL)
/v1/responses — OpenAI Responses (required by Codex; used by Cursor, Continue, Cline)

All three endpoints accept Bearer tokens that start with sk-optiq-. The suffix is anything you want; the prefix is checked. Authorization header may be omitted for local-dev curl calls.

Coverage matrix

Tool	API protocol	OptiQ endpoint	Verified version
Claude Code	Anthropic Messages	`/v1/messages`	2.1.143
Codex	OpenAI Responses	`/v1/responses`	0.130.0
OpenCode	OpenAI Chat Completions	`/v1/chat/completions`	1.15.4
OpenClaw	Anthropic Messages	`/v1/messages`	2026.5.12
Hermes Agent	OpenAI Chat Completions	`/v1/chat/completions`	0.14.0
Cursor	OpenAI Responses	`/v1/responses`	same config as Codex

"Verified version" is the build of each agent we ran the full wire test against on macOS (Apple Silicon). Newer versions should keep working; if you hit a regression please file an issue.

Quickstart (any tool)

terminalbash

# 1. Start the server (any OptiQ-quantized model)
$ optiq serve --model mlx-community/Qwen3.5-9B-OptiQ-4bit --port 8080

# 2. Hand the tool these settings:
#    Base URL:  http://localhost:8080/v1
#    API key:   sk-optiq-local  (any string prefixed sk-optiq-)
#    Protocol:  see "Coverage matrix" above for which endpoint the tool uses

MTP-aware serving

Add --mtp to enable in-checkpoint MTP speculative decoding for ~1.4-1.8× decode tok/s on Qwen3.5 / 3.6 family. Works transparently for all three endpoints; tools don't need to know about it.

terminalbash

$ optiq serve --model mlx-community/Qwen3.5-9B-OptiQ-4bit \
    --mtp --mtp-depth 2 --port 8080

Why one server, three protocols Tools have fragmented across three competing API standards. Rather than make you pick one (or run three servers), optiq serve speaks all three from the same process. Internally everything funnels into the same generation loop — the Responses and Anthropic endpoints translate to/from OpenAI Chat Completions and reuse the existing handler. MTP and KV-quant apply to all three transparently.

What if my tool isn't listed?

If the tool can be pointed at a custom OpenAI-compatible base_url, it works with mlx-optiq out of the box. The matrix above just covers the tools we've tested end-to-end. Common candidates that work but aren't documented here yet:

aider — set --openai-api-base http://localhost:8080/v1
Open WebUI — add as an OpenAI-compatible connection in settings
LangChain / LlamaIndex / DSPy — set the OpenAI client's base_url
Anything using the openai Python SDK — instantiate with OpenAI(base_url=..., api_key="sk-optiq-...")