Tokmux API Reference
Unified OpenAI-compatible gateway. One Virtual Key, one schema, every supported provider. Authenticate, send an OpenAI-shaped request to /api/v1/chat/completions, get an OpenAI-shaped response regardless of which provider serves the completion.
Quick start
Three steps to your first call. Everything below assumes the default base URL https://api.tokmux.com/api/v1.
- Create a Virtual Key. Sign in to the dashboard, open the Keys page, and create a key. Keys are prefixed
sk-and are shown once at creation — store the full value. - Make a call. POST to
/api/v1/chat/completionswith an OpenAI-shaped body. Themodelfield uses{provider}/{model_name}form.curlcurl https://api.tokmux.com/api/v1/chat/completions \ -H "Authorization: Bearer sk-..." \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4-6", "messages": [ { "role": "system", "content": "You are a concise technical assistant." }, { "role": "user", "content": "Explain HTTP status 429 in one sentence." } ], "max_tokens": 128 }' - Read the response. The body is OpenAI
chat.completionregardless of upstream provider. Theidis a tokmux-issued ULID; the upstream id (if any) lives inprovider_request_id.200 OK{ "id": "tokmux-req-01krdp1pq24renxak657ax566a", "object": "chat.completion", "created": 1778575794, "model": "anthropic/claude-sonnet-4-6", "provider_request_id": "msg_014wLXrkm3wAgGijVj4fdQXe", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "HTTP 429 means the client has sent too many requests in a given time window and should retry after the period specified in the Retry-After header." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 24, "completion_tokens": 38, "total_tokens": 62, "cache_read_tokens": 0, "cache_creation_tokens": 0 } }
That's the full loop. The rest of this page covers streaming, extended thinking, billing, and error codes.
Authentication
Every request authenticates with a Virtual Key (prefix sk-). Pass it as a Bearer token:
Authorization: Bearer sk-your-virtual-keyCreate Virtual Keys from the dashboard under your project's Keys page. Each key is scoped to an organization and project, with an optional model whitelist and per-key spending cap (USD).
A missing or revoked key returns 401 with "type": "authentication_error". A model not whitelisted for the key returns 403; a model not in the catalog returns 404. See Error codes.
Models
OpenAI-compatible model listing. Compatible with the OpenAI SDK, OpenWebUI, Cursor, Continue.dev, and any client that calls /v1/models on init.
List models
/api/v1/modelsReturns every model the Virtual Key has access to. The list is filtered by the organization's catalog and the key's whitelist. Keys with no whitelist see the full catalog.
Each model's id is the value you pass as model in /chat/completions.
curl https://api.tokmux.com/api/v1/models \
-H "Authorization: Bearer sk-..."{
"object": "list",
"data": [
{
"id": "anthropic/claude-sonnet-4-6",
"object": "model",
"created": 1776848230,
"owned_by": "anthropic"
},
{
"id": "fireworks-ai/deepseek-v4-pro",
"object": "model",
"created": 1777367000,
"owned_by": "fireworks-ai"
},
{
"id": "openai/gpt-5.4",
"object": "model",
"created": 1777026818,
"owned_by": "openai"
}
]
}Retrieve a model
/api/v1/models/{model}Returns a single model by ID. The {model} path parameter uses the same {provider}/{model_name} format. Returns 404 if the model is not in the catalog or not whitelisted for the key.
curl https://api.tokmux.com/api/v1/models/anthropic/claude-sonnet-4-6 \
-H "Authorization: Bearer sk-..."{
"id": "anthropic/claude-sonnet-4-6",
"object": "model",
"created": 1776848230,
"owned_by": "anthropic"
}Chat Completions
/api/v1/chat/completionsThe unified chat endpoint. Send an OpenAI-shaped request, get an OpenAI-shaped response, regardless of which provider serves the completion. Tokmux translates request and response formats for non-OpenAI providers (Anthropic, Google) and passes the rest through unchanged.
Model naming
The model field takes {provider_slug}/{model_name}. Use the id from GET /api/v1/models.
| Provider | model value |
|---|---|
| Anthropic | anthropic/claude-sonnet-4-6 |
| Anthropic | anthropic/claude-opus-4-6 |
| OpenAI | openai/gpt-5.4 |
| Google AI | google/gemini-3.1-flash-image-preview |
| Fireworks AI | fireworks-ai/deepseek-v4-pro |
Parameters
Body is JSON. The shape mirrors OpenAI's Chat Completions API. Provider-specific fields are forwarded verbatim — anything not listed below is passed through to the upstream as-is.
| Field | Type | Required | Description |
|---|---|---|---|
| model | string | required | {provider}/{model_name} — e.g. anthropic/claude-sonnet-4-6. Use the id from GET /api/v1/models. |
| messages | array<object> | required | OpenAI chat messages. Each item has role (system, user, assistant, tool) and content (string or array). |
| stream | boolean | optional | When true, response is text/event-stream. Defaults to false. |
| max_tokens | integer | optional | Max output tokens — Anthropic / Fireworks naming. |
| max_completion_tokens | integer | optional | Max output tokens — OpenAI chat completions naming. |
| temperature | number | optional | Sampling temperature. |
| top_p | number | optional | Nucleus sampling cutoff. |
| stop | string | array<string> | optional | Stop sequences. Forwarded to the provider unchanged. |
| tools | array<object> | optional | OpenAI tool definitions. Translated for Anthropic / Google. |
| tool_choice | string | object | optional | OpenAI tool-choice control. |
| thinking | object | optional | Anthropic extended thinking. { "type": "enabled", "budget_tokens": N } enables reasoning deltas. Ignored for non-Anthropic providers. |
curl https://api.tokmux.com/api/v1/chat/completions \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-6",
"messages": [
{ "role": "system", "content": "You are a concise technical assistant." },
{ "role": "user", "content": "Explain HTTP status 429 in one sentence." }
],
"max_tokens": 128
}'{
"id": "tokmux-req-01krdp1pq24renxak657ax566a",
"object": "chat.completion",
"created": 1778575794,
"model": "anthropic/claude-sonnet-4-6",
"provider_request_id": "msg_014wLXrkm3wAgGijVj4fdQXe",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "HTTP 429 means the client has sent too many requests in a given time window and should retry after the period specified in the Retry-After header."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 38,
"total_tokens": 62,
"cache_read_tokens": 0,
"cache_creation_tokens": 0
}
}curl https://api.tokmux.com/api/v1/chat/completions \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.4",
"messages": [
{ "role": "user", "content": "Convert 72°F to Celsius." }
],
"max_completion_tokens": 64
}'Provider translation
Anthropic models go through full schema translation in both directions; OpenAI and Fireworks AI pass through unchanged. The response always lands as OpenAI chat.completion — anything provider-specific is either folded into a standard field or namespaced (e.g. Anthropic cache counters in usage, Fireworks reasoning in message.reasoning_content).
| Provider | Request | Response |
|---|---|---|
| Anthropic | OpenAI → Anthropic Messages | Anthropic → OpenAI `chat.completion` (incl. SSE) |
| OpenAI | pass-through | pass-through |
| Fireworks AI | pass-through | pass-through |
The response id is always a tokmux-issued ULID prefixed tokmux-req-. The upstream id (when one exists) is preserved under provider_request_id for correlation against provider dashboards.
Streaming
Set stream: true for text/event-stream. Each chunk is one data: line terminated by a blank line. The first chunk carries delta.role, subsequent chunks carry delta.content, the final chunk sets finish_reason and includes usage. The stream terminates with a literal data: [DONE] sentinel — there is no further payload after it.
curl https://api.tokmux.com/api/v1/chat/completions \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-6",
"messages": [
{ "role": "user", "content": "Summarize the CAP theorem." }
],
"max_tokens": 256,
"stream": true
}'data: {"id":"tokmux-req-01krdp1rr8y1vzvbzda4pyhjee","object":"chat.completion.chunk","created":1778575796,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"tokmux-req-01krdp1rr8y1vzvbzda4pyhjee","object":"chat.completion.chunk","created":1778575796,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":"The CAP theorem"},"finish_reason":null}]}
data: {"id":"tokmux-req-01krdp1rr8y1vzvbzda4pyhjee","object":"chat.completion.chunk","created":1778575796,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":" states that a distributed"},"finish_reason":null}]}
data: {"id":"tokmux-req-01krdp1rr8y1vzvbzda4pyhjee","object":"chat.completion.chunk","created":1778575796,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"completion_tokens":12,"total_tokens":26}}
data: [DONE]Extended thinking
Anthropic reasoning models accept a thinking object on the request:
"thinking": { "type": "enabled", "budget_tokens": 4096 }On non-streaming responses, the reasoning trace lands on choices[0].message.thinking_content alongside the regular content. On streaming responses, delta.thinking_content chunks arrive before delta.content chunks — keep them in a separate buffer if you render reasoning and output in different surfaces. The field is ignored for non-Anthropic providers.
curl https://api.tokmux.com/api/v1/chat/completions \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-6",
"messages": [
{ "role": "user", "content": "What is the time complexity of Dijkstra with a Fibonacci heap?" }
],
"stream": true,
"thinking": { "type": "enabled", "budget_tokens": 4096 }
}'Billing & whitelisting
Successful responses include an X-Usage-Event-Id header, the ULID of the usage event tokmux recorded for this call. Use it to reconcile against your usage reports. Error responses do not carry this header.
Settlement is FIFO across the Virtual Key's funded balance. Whitelisting is enforced at invoke time: a model in the catalog returns 403 model_not_whitelisted if the key isn't whitelisted for it. A model not in the catalog at all returns 404. Treat the catalog as discovery, not entitlement.
Error codes
Errors follow the OpenAI error object format:
{
"error": {
"message": "Human-readable description.",
"type": "error_type",
"param": null,
"code": "machine_readable_code"
}
}| Status | Type | Codes | Description |
|---|---|---|---|
| 400 | invalid_request_error | missing_modelinvalid_model_formatinvalid_jsonunsupported_provider | Malformed request. Codes distinguish the cause: missing `model` field, wrong `provider/model` format, unparseable JSON, or provider not in catalog. |
| 401 | authentication_error | invalid_api_key | Missing, malformed, or revoked Virtual Key. |
| 402 | billing_error | insufficient_credits | Insufficient balance in the organization's account. |
| 403 | permission_error | model_not_whitelisted | Model exists in the catalog but is not whitelisted for this Virtual Key. |
| 404 | invalid_request_error | model_not_foundroute_not_found | Model not in the catalog, or route does not exist. |
| 429 | rate_limit_error | rate_limit_exceededspending_limit_exceeded | Request-rate or per-key spending cap exceeded. Retry with backoff. |
| 500 | server_error | internal_error | Unexpected server error. Retry; if persistent, contact support. |
| 502 | server_error | upstream_error | Upstream provider returned an error or is unreachable. |
Successful responses (2xx) include an X-Usage-Event-Id header, the ULID of the recorded usage event. Error responses do not carry this header.