Developer APIv1.0.0

Tokmux API Reference

Unified OpenAI-compatible gateway. One Virtual Key, one schema, every supported provider. Authenticate, send an OpenAI-shaped request to /api/v1/chat/completions, get an OpenAI-shaped response regardless of which provider serves the completion.

Quick start

Three steps to your first call. Everything below assumes the default base URL https://api.tokmux.com/api/v1.

Create a Virtual Key. Sign in to the dashboard, open the Keys page, and create a key. Keys are prefixed sk- and are shown once at creation — store the full value.

Make a call. POST to /api/v1/chat/completions with an OpenAI-shaped body. The model field uses {provider}/{model_name} form.

curl

curl https://api.tokmux.com/api/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      { "role": "system", "content": "You are a concise technical assistant." },
      { "role": "user", "content": "Explain HTTP status 429 in one sentence." }
    ],
    "max_tokens": 128
  }'

Read the response. The body is OpenAI chat.completion regardless of upstream provider. The id is a tokmux-issued ULID; the upstream id (if any) lives in provider_request_id.

200 OK

{
  "id": "tokmux-req-01krdp1pq24renxak657ax566a",
  "object": "chat.completion",
  "created": 1778575794,
  "model": "anthropic/claude-sonnet-4-6",
  "provider_request_id": "msg_014wLXrkm3wAgGijVj4fdQXe",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "HTTP 429 means the client has sent too many requests in a given time window and should retry after the period specified in the Retry-After header."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 38,
    "total_tokens": 62,
    "cache_read_tokens": 0,
    "cache_creation_tokens": 0
  }
}

That's the full loop. The rest of this page covers streaming, extended thinking, billing, and error codes.

Authentication

Every request authenticates with a Virtual Key (prefix sk-). Pass it as a Bearer token:

bash

Authorization: Bearer sk-your-virtual-key

Create Virtual Keys from the dashboard under your project's Keys page. Each key is scoped to an organization and project, with an optional model whitelist and per-key spending cap (USD).

A missing or revoked key returns 401 with "type": "authentication_error". A model not whitelisted for the key returns 403; a model not in the catalog returns 404. See Error codes.

Models

OpenAI-compatible model listing. Compatible with the OpenAI SDK, OpenWebUI, Cursor, Continue.dev, and any client that calls /v1/models on init.

List models

GET/api/v1/models

Returns every model the Virtual Key has access to. The list is filtered by the organization's catalog and the key's whitelist. Keys with no whitelist see the full catalog.

Each model's id is the value you pass as model in /chat/completions.

curl

curl https://api.tokmux.com/api/v1/models \
  -H "Authorization: Bearer sk-..."

200 OK

{
  "object": "list",
  "data": [
    {
      "id": "anthropic/claude-sonnet-4-6",
      "object": "model",
      "created": 1776848230,
      "owned_by": "anthropic"
    },
    {
      "id": "fireworks-ai/deepseek-v4-pro",
      "object": "model",
      "created": 1777367000,
      "owned_by": "fireworks-ai"
    },
    {
      "id": "openai/gpt-5.4",
      "object": "model",
      "created": 1777026818,
      "owned_by": "openai"
    }
  ]
}

Retrieve a model

GET/api/v1/models/{model}

Returns a single model by ID. The {model} path parameter uses the same {provider}/{model_name} format. Returns 404 if the model is not in the catalog or not whitelisted for the key.

curl

curl https://api.tokmux.com/api/v1/models/anthropic/claude-sonnet-4-6 \
  -H "Authorization: Bearer sk-..."

200 OK

{
  "id": "anthropic/claude-sonnet-4-6",
  "object": "model",
  "created": 1776848230,
  "owned_by": "anthropic"
}

Chat Completions

POST/api/v1/chat/completions

The unified chat endpoint. Send an OpenAI-shaped request, get an OpenAI-shaped response, regardless of which provider serves the completion. Tokmux translates request and response formats for non-OpenAI providers (Anthropic, Google) and passes the rest through unchanged.

Model naming

The model field takes {provider_slug}/{model_name}. Use the id from GET /api/v1/models.

Provider	model value
Anthropic	anthropic/claude-sonnet-4-6
Anthropic	anthropic/claude-opus-4-6
OpenAI	openai/gpt-5.4
Google AI	google/gemini-3.1-flash-image-preview
Fireworks AI	fireworks-ai/deepseek-v4-pro

Parameters

Body is JSON. The shape mirrors OpenAI's Chat Completions API. Provider-specific fields are forwarded verbatim — anything not listed below is passed through to the upstream as-is.

Field	Type	Required	Description
model	string	required	`{provider}/{model_name}` — e.g. `anthropic/claude-sonnet-4-6`. Use the `id` from `GET /api/v1/models`.
messages	array<object>	required	OpenAI chat messages. Each item has `role` (`system`, `user`, `assistant`, `tool`) and `content` (string or array).
stream	boolean	optional	When `true`, response is `text/event-stream`. Defaults to `false`.
max_tokens	integer	optional	Max output tokens — Anthropic / Fireworks naming.
max_completion_tokens	integer	optional	Max output tokens — OpenAI chat completions naming.
temperature	number	optional	Sampling temperature.
top_p	number	optional	Nucleus sampling cutoff.
stop	string \| array<string>	optional	Stop sequences. Forwarded to the provider unchanged.
tools	array<object>	optional	OpenAI tool definitions. Translated for Anthropic / Google.
tool_choice	string \| object	optional	OpenAI tool-choice control.
thinking	object	optional	Anthropic extended thinking. `{ "type": "enabled", "budget_tokens": N }` enables reasoning deltas. Ignored for non-Anthropic providers.

curl — Anthropic

curl https://api.tokmux.com/api/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      { "role": "system", "content": "You are a concise technical assistant." },
      { "role": "user", "content": "Explain HTTP status 429 in one sentence." }
    ],
    "max_tokens": 128
  }'

200 OK

{
  "id": "tokmux-req-01krdp1pq24renxak657ax566a",
  "object": "chat.completion",
  "created": 1778575794,
  "model": "anthropic/claude-sonnet-4-6",
  "provider_request_id": "msg_014wLXrkm3wAgGijVj4fdQXe",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "HTTP 429 means the client has sent too many requests in a given time window and should retry after the period specified in the Retry-After header."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 38,
    "total_tokens": 62,
    "cache_read_tokens": 0,
    "cache_creation_tokens": 0
  }
}

curl — OpenAI

curl https://api.tokmux.com/api/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [
      { "role": "user", "content": "Convert 72°F to Celsius." }
    ],
    "max_completion_tokens": 64
  }'

Provider translation

Anthropic models go through full schema translation in both directions; OpenAI and Fireworks AI pass through unchanged. The response always lands as OpenAI chat.completion — anything provider-specific is either folded into a standard field or namespaced (e.g. Anthropic cache counters in usage, Fireworks reasoning in message.reasoning_content).

Provider	Request	Response
Anthropic	OpenAI → Anthropic Messages	Anthropic → OpenAI `chat.completion` (incl. SSE)
OpenAI	pass-through	pass-through
Fireworks AI	pass-through	pass-through

The response id is always a tokmux-issued ULID prefixed tokmux-req-. The upstream id (when one exists) is preserved under provider_request_id for correlation against provider dashboards.

Streaming

Set stream: true for text/event-stream. Each chunk is one data: line terminated by a blank line. The first chunk carries delta.role, subsequent chunks carry delta.content, the final chunk sets finish_reason and includes usage. The stream terminates with a literal data: [DONE] sentinel — there is no further payload after it.

curl

curl https://api.tokmux.com/api/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      { "role": "user", "content": "Summarize the CAP theorem." }
    ],
    "max_tokens": 256,
    "stream": true
  }'

SSE

data: {"id":"tokmux-req-01krdp1rr8y1vzvbzda4pyhjee","object":"chat.completion.chunk","created":1778575796,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"tokmux-req-01krdp1rr8y1vzvbzda4pyhjee","object":"chat.completion.chunk","created":1778575796,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":"The CAP theorem"},"finish_reason":null}]}

data: {"id":"tokmux-req-01krdp1rr8y1vzvbzda4pyhjee","object":"chat.completion.chunk","created":1778575796,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":" states that a distributed"},"finish_reason":null}]}

data: {"id":"tokmux-req-01krdp1rr8y1vzvbzda4pyhjee","object":"chat.completion.chunk","created":1778575796,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"completion_tokens":12,"total_tokens":26}}

data: [DONE]

Extended thinking

Anthropic reasoning models accept a thinking object on the request:

json

"thinking": { "type": "enabled", "budget_tokens": 4096 }

On non-streaming responses, the reasoning trace lands on choices[0].message.thinking_content alongside the regular content. On streaming responses, delta.thinking_content chunks arrive before delta.content chunks — keep them in a separate buffer if you render reasoning and output in different surfaces. The field is ignored for non-Anthropic providers.

curl

curl https://api.tokmux.com/api/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      { "role": "user", "content": "What is the time complexity of Dijkstra with a Fibonacci heap?" }
    ],
    "stream": true,
    "thinking": { "type": "enabled", "budget_tokens": 4096 }
  }'

Billing & whitelisting

Successful responses include an X-Usage-Event-Id header, the ULID of the usage event tokmux recorded for this call. Use it to reconcile against your usage reports. Error responses do not carry this header.

Settlement is FIFO across the Virtual Key's funded balance. Whitelisting is enforced at invoke time: a model in the catalog returns 403 model_not_whitelisted if the key isn't whitelisted for it. A model not in the catalog at all returns 404. Treat the catalog as discovery, not entitlement.

Error codes

Errors follow the OpenAI error object format:

json

{
  "error": {
    "message": "Human-readable description.",
    "type": "error_type",
    "param": null,
    "code": "machine_readable_code"
  }
}

Status	Type	Codes	Description
400	invalid_request_error	`missing_modelinvalid_model_formatinvalid_jsonunsupported_provider`	Malformed request. Codes distinguish the cause: missing `model` field, wrong `provider/model` format, unparseable JSON, or provider not in catalog.
401	authentication_error	`invalid_api_key`	Missing, malformed, or revoked Virtual Key.
402	billing_error	`insufficient_credits`	Insufficient balance in the organization's account.
403	permission_error	`model_not_whitelisted`	Model exists in the catalog but is not whitelisted for this Virtual Key.
404	invalid_request_error	`model_not_foundroute_not_found`	Model not in the catalog, or route does not exist.
429	rate_limit_error	`rate_limit_exceededspending_limit_exceeded`	Request-rate or per-key spending cap exceeded. Retry with backoff.
500	server_error	`internal_error`	Unexpected server error. Retry; if persistent, contact support.
502	server_error	`upstream_error`	Upstream provider returned an error or is unreachable.

Successful responses (2xx) include an X-Usage-Event-Id header, the ULID of the recorded usage event. Error responses do not carry this header.