mlx-optiq
Lab · Chat

Chat

A streaming playground against whichever model the API is serving, with local tools, retrieval over your files, and live HTML rendering.

Tools

Three built-in tools the model can call locally:

  • web_search: DuckDuckGo search plus a url mode to fetch a page as compact markdown. No API key.
  • python: runs Python in a three-tier sandbox (apple/container if installed, else macOS sandbox-exec, else subprocess with rlimit). AST checks block os.system, signal tampering, and direct network calls. Matplotlib charts the model saves render inline.
  • terminal: a bash one-liner in the same sandbox. Dangerous commands (sudo, curl, rm) are rejected at command position with token-aware parsing.

Local models often emit malformed tool calls. The Lab heals six common shapes (Hermes <tool_call> tags, fenced JSON, bare objects, trailing commas, fancy quotes, function-call style) before dispatching, flagged with a healed chip.

Self-healing tool calls

The tool loop is server-side, capped at 25 turns. Consecutive identical successful calls de-duplicate to a nudge instead of re-running the sandbox. On top of that, a per-call retry budget stops the model re-running the same failing call forever: three attempts with the same arguments, then it's told to change course (different tool, different arguments, or answer with what it has), and the tool card shows a retry limit badge. If the model still won't finish after 25 turns, the orchestrator forces one final tools-disabled re-prompt so you get a text answer, not a hard error.

Chat with files (RAG with citations)

Attaching a non-image file indexes it instead of dumping the whole document into the prompt. A dependency-free BM25 retriever pulls the chunks most relevant to each question, prepends them with [n] citation markers, and the Lab renders a sources panel under the answer. Only the retrieved chunks enter the context, so a long PDF costs a few hundred tokens per turn instead of blowing the window.

Chat with files: cited answer with a sources panel

Canvas

Assistant messages that contain a fenced html block render live in a sandboxed allow-scripts iframe next to the answer, with a view-source toggle. Useful for charts, tables, and small interactive widgets the model writes. The iframe has no same-origin access, so the page stays isolated.

Canvas: model HTML rendered in a sandboxed iframe

Input + history

Per-message controls for temperature, max tokens, and enable_thinking. Images attach as vision input on the multimodal quants. Multi-step tool runs collapse into a single "N tool calls" accordion. A Stop button cancels the run and SIGKILLs any sandbox subprocess. Saved chats persist to ~/.optiq/lab/chats/.