mlx-optiq
Integration · Claude Code

Claude Code

Anthropic's Claude Code is a terminal-based coding agent that uses the Anthropic Messages API. Point it at optiq serve via ANTHROPIC_BASE_URL and it'll talk to your local OptIQ model instead of Anthropic's hosted Claude.

1. Install Claude Code

terminalbash
$ npm install -g @anthropic-ai/claude-code

2. Start optiq serve

terminalbash
$ optiq serve \
    --model mlx-community/Qwen3.5-9B-OptiQ-4bit \
    --mtp --mtp-depth 2 \
    --port 8080

The default --anthropic flag installs the /v1/messages endpoint. The --mtp flag enables ~1.4-1.8× decode speedup on Qwen3.5 / 3.6 family.

3. Point Claude Code at OptIQ

terminalbash
# In your shell rc, or per-session:
export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_API_KEY=sk-optiq-local
export ANTHROPIC_MODEL=mlx-community/Qwen3.5-9B-OptiQ-4bit

$ claude

That's it. Claude Code will route every message through your local optiq serve. To go back to hosted Claude, unset the three env vars.

Notes

  • Tool use: Qwen and Llama models that emit <tool_call>...</tool_call> blocks are translated into Anthropic tool_use content blocks transparently. Models without native tool-call training (most small ones) won't drive Claude Code's full agentic loop; pick 9B+ for serious coding work.
  • Streaming: works out of the box. Claude Code shows tokens as they arrive.
  • Auth token: any string starting with sk-optiq- works. Mirrors Unsloth's sk-unsloth-* convention.
  • ?beta=true query string: Claude Code appends ?beta=true to /v1/messages for the prompt-caching beta. Our endpoint strips the query string and routes the request normally, so the beta header is a no-op on the server side without breaking the wire.
  • Verified: tested against Claude Code 2.1.143 on macOS (Apple Silicon).
Why a local Claude Code matters Claude Code is one of the most polished coding agents shipping today. Running it against a local model gives you the same UX with no API spend, no network round-trips, and complete data sovereignty. The OptIQ + Qwen3.5-9B-MTP combination is fast enough for a fluid edit-and-run loop on M3 Pro and up.