Lab · Arena

Model Arena

Compare two models on the same prompt, side by side, with tokens/sec for each.

Model A runs on the Lab's main API server; model B runs in a second server on port + 1, started on demand. The obvious use is an OptiQ mixed-precision quant against a uniform 4-bit one, or a quant against its bf16 base, answering live in the same window so you can see the quality and speed difference at once.

Model Arena: two models compared side by side

Pick a model in each column (from your published OptiQ quants or local converts), load them, type one prompt, and hit Compare. Both panes stream their answer and report tokens/sec and the token count. Reasoning models are asked to answer directly (thinking disabled) so the panes show a real answer rather than burning the budget on hidden chain-of-thought.

Resident at onceBoth models stay loaded while the Arena is open, so it's best with small or fast models. On a tight machine, compare two quants of the same small base.