Developer APIv1.0.0

Tokmux API Reference

Unified OpenAI-compatible gateway. One Virtual Key, one schema, every supported provider. Authenticate, send an OpenAI-shaped request to /api/v1/chat/completions, get an OpenAI-shaped response regardless of which provider serves the completion.

Quick start

Three steps to your first call. Everything below assumes the default base URL https://api.tokmux.com/api/v1.

  1. Create a Virtual Key. Sign in to the dashboard, open the Keys page, and create a key. Keys are prefixed sk- and are shown once at creation — store the full value.
  2. Make a call. POST to /api/v1/chat/completions with an OpenAI-shaped body. The model field uses {provider}/{model_name} form.
    curl
    curl https://api.tokmux.com/api/v1/chat/completions \
      -H "Authorization: Bearer sk-..." \
      -H "Content-Type: application/json" \
      -d '{
        "model": "anthropic/claude-sonnet-4-6",
        "messages": [
          { "role": "system", "content": "You are a concise technical assistant." },
          { "role": "user", "content": "Explain HTTP status 429 in one sentence." }
        ],
        "max_tokens": 128
      }'
  3. Read the response. The body is OpenAI chat.completion regardless of upstream provider. The id is a tokmux-issued ULID; the upstream id (if any) lives in provider_request_id.
    200 OK
    {
      "id": "tokmux-req-01krdp1pq24renxak657ax566a",
      "object": "chat.completion",
      "created": 1778575794,
      "model": "anthropic/claude-sonnet-4-6",
      "provider_request_id": "msg_014wLXrkm3wAgGijVj4fdQXe",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "HTTP 429 means the client has sent too many requests in a given time window and should retry after the period specified in the Retry-After header."
          },
          "finish_reason": "stop"
        }
      ],
      "usage": {
        "prompt_tokens": 24,
        "completion_tokens": 38,
        "total_tokens": 62,
        "cache_read_tokens": 0,
        "cache_creation_tokens": 0
      }
    }

That's the full loop. The rest of this page covers streaming, extended thinking, billing, and error codes.

Authentication

Every request authenticates with a Virtual Key (prefix sk-). Pass it as a Bearer token:

bash
Authorization: Bearer sk-your-virtual-key

Create Virtual Keys from the dashboard under your project's Keys page. Each key is scoped to an organization and project, with an optional model whitelist and per-key spending cap (USD).

A missing or revoked key returns 401 with "type": "authentication_error". A model not whitelisted for the key returns 403; a model not in the catalog returns 404. See Error codes.

Models

OpenAI-compatible model listing. Compatible with the OpenAI SDK, OpenWebUI, Cursor, Continue.dev, and any client that calls /v1/models on init.

List models

GET/api/v1/models

Returns every model the Virtual Key has access to. The list is filtered by the organization's catalog and the key's whitelist. Keys with no whitelist see the full catalog.

Each model's id is the value you pass as model in /chat/completions.

curl
curl https://api.tokmux.com/api/v1/models \
  -H "Authorization: Bearer sk-..."
200 OK
{
  "object": "list",
  "data": [
    {
      "id": "anthropic/claude-sonnet-4-6",
      "object": "model",
      "created": 1776848230,
      "owned_by": "anthropic"
    },
    {
      "id": "fireworks-ai/deepseek-v4-pro",
      "object": "model",
      "created": 1777367000,
      "owned_by": "fireworks-ai"
    },
    {
      "id": "openai/gpt-5.4",
      "object": "model",
      "created": 1777026818,
      "owned_by": "openai"
    }
  ]
}

Retrieve a model

GET/api/v1/models/{model}

Returns a single model by ID. The {model} path parameter uses the same {provider}/{model_name} format. Returns 404 if the model is not in the catalog or not whitelisted for the key.

curl
curl https://api.tokmux.com/api/v1/models/anthropic/claude-sonnet-4-6 \
  -H "Authorization: Bearer sk-..."
200 OK
{
  "id": "anthropic/claude-sonnet-4-6",
  "object": "model",
  "created": 1776848230,
  "owned_by": "anthropic"
}

Chat Completions

POST/api/v1/chat/completions

The unified chat endpoint. Send an OpenAI-shaped request, get an OpenAI-shaped response, regardless of which provider serves the completion. Tokmux translates request and response formats for non-OpenAI providers (Anthropic, Google) and passes the rest through unchanged.

Model naming

The model field takes {provider_slug}/{model_name}. Use the id from GET /api/v1/models.

Providermodel value
Anthropicanthropic/claude-sonnet-4-6
Anthropicanthropic/claude-opus-4-6
OpenAIopenai/gpt-5.4
Google AIgoogle/gemini-3.1-flash-image-preview
Fireworks AIfireworks-ai/deepseek-v4-pro

Parameters

Body is JSON. The shape mirrors OpenAI's Chat Completions API. Provider-specific fields are forwarded verbatim — anything not listed below is passed through to the upstream as-is.

FieldTypeRequiredDescription
modelstringrequired{provider}/{model_name} — e.g. anthropic/claude-sonnet-4-6. Use the id from GET /api/v1/models.
messagesarray<object>requiredOpenAI chat messages. Each item has role (system, user, assistant, tool) and content (string or array).
streambooleanoptionalWhen true, response is text/event-stream. Defaults to false.
max_tokensintegeroptionalMax output tokens — Anthropic / Fireworks naming.
max_completion_tokensintegeroptionalMax output tokens — OpenAI chat completions naming.
temperaturenumberoptionalSampling temperature.
top_pnumberoptionalNucleus sampling cutoff.
stopstring | array<string>optionalStop sequences. Forwarded to the provider unchanged.
toolsarray<object>optionalOpenAI tool definitions. Translated for Anthropic / Google.
tool_choicestring | objectoptionalOpenAI tool-choice control.
thinkingobjectoptionalAnthropic extended thinking. { "type": "enabled", "budget_tokens": N } enables reasoning deltas. Ignored for non-Anthropic providers.
curl — Anthropic
curl https://api.tokmux.com/api/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      { "role": "system", "content": "You are a concise technical assistant." },
      { "role": "user", "content": "Explain HTTP status 429 in one sentence." }
    ],
    "max_tokens": 128
  }'
200 OK
{
  "id": "tokmux-req-01krdp1pq24renxak657ax566a",
  "object": "chat.completion",
  "created": 1778575794,
  "model": "anthropic/claude-sonnet-4-6",
  "provider_request_id": "msg_014wLXrkm3wAgGijVj4fdQXe",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "HTTP 429 means the client has sent too many requests in a given time window and should retry after the period specified in the Retry-After header."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 38,
    "total_tokens": 62,
    "cache_read_tokens": 0,
    "cache_creation_tokens": 0
  }
}
curl — OpenAI
curl https://api.tokmux.com/api/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [
      { "role": "user", "content": "Convert 72°F to Celsius." }
    ],
    "max_completion_tokens": 64
  }'

Provider translation

Anthropic models go through full schema translation in both directions; OpenAI and Fireworks AI pass through unchanged. The response always lands as OpenAI chat.completion — anything provider-specific is either folded into a standard field or namespaced (e.g. Anthropic cache counters in usage, Fireworks reasoning in message.reasoning_content).

ProviderRequestResponse
AnthropicOpenAI → Anthropic MessagesAnthropic → OpenAI `chat.completion` (incl. SSE)
OpenAIpass-throughpass-through
Fireworks AIpass-throughpass-through

The response id is always a tokmux-issued ULID prefixed tokmux-req-. The upstream id (when one exists) is preserved under provider_request_id for correlation against provider dashboards.

Streaming

Set stream: true for text/event-stream. Each chunk is one data: line terminated by a blank line. The first chunk carries delta.role, subsequent chunks carry delta.content, the final chunk sets finish_reason and includes usage. The stream terminates with a literal data: [DONE] sentinel — there is no further payload after it.

curl
curl https://api.tokmux.com/api/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      { "role": "user", "content": "Summarize the CAP theorem." }
    ],
    "max_tokens": 256,
    "stream": true
  }'
SSE
data: {"id":"tokmux-req-01krdp1rr8y1vzvbzda4pyhjee","object":"chat.completion.chunk","created":1778575796,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"tokmux-req-01krdp1rr8y1vzvbzda4pyhjee","object":"chat.completion.chunk","created":1778575796,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":"The CAP theorem"},"finish_reason":null}]}

data: {"id":"tokmux-req-01krdp1rr8y1vzvbzda4pyhjee","object":"chat.completion.chunk","created":1778575796,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":" states that a distributed"},"finish_reason":null}]}

data: {"id":"tokmux-req-01krdp1rr8y1vzvbzda4pyhjee","object":"chat.completion.chunk","created":1778575796,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"completion_tokens":12,"total_tokens":26}}

data: [DONE]

Extended thinking

Anthropic reasoning models accept a thinking object on the request:

json
"thinking": { "type": "enabled", "budget_tokens": 4096 }

On non-streaming responses, the reasoning trace lands on choices[0].message.thinking_content alongside the regular content. On streaming responses, delta.thinking_content chunks arrive before delta.content chunks — keep them in a separate buffer if you render reasoning and output in different surfaces. The field is ignored for non-Anthropic providers.

curl
curl https://api.tokmux.com/api/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [
      { "role": "user", "content": "What is the time complexity of Dijkstra with a Fibonacci heap?" }
    ],
    "stream": true,
    "thinking": { "type": "enabled", "budget_tokens": 4096 }
  }'

Billing & whitelisting

Successful responses include an X-Usage-Event-Id header, the ULID of the usage event tokmux recorded for this call. Use it to reconcile against your usage reports. Error responses do not carry this header.

Settlement is FIFO across the Virtual Key's funded balance. Whitelisting is enforced at invoke time: a model in the catalog returns 403 model_not_whitelisted if the key isn't whitelisted for it. A model not in the catalog at all returns 404. Treat the catalog as discovery, not entitlement.

Error codes

Errors follow the OpenAI error object format:

json
{
  "error": {
    "message": "Human-readable description.",
    "type": "error_type",
    "param": null,
    "code": "machine_readable_code"
  }
}
StatusTypeCodesDescription
400invalid_request_error
missing_modelinvalid_model_formatinvalid_jsonunsupported_provider
Malformed request. Codes distinguish the cause: missing `model` field, wrong `provider/model` format, unparseable JSON, or provider not in catalog.
401authentication_error
invalid_api_key
Missing, malformed, or revoked Virtual Key.
402billing_error
insufficient_credits
Insufficient balance in the organization's account.
403permission_error
model_not_whitelisted
Model exists in the catalog but is not whitelisted for this Virtual Key.
404invalid_request_error
model_not_foundroute_not_found
Model not in the catalog, or route does not exist.
429rate_limit_error
rate_limit_exceededspending_limit_exceeded
Request-rate or per-key spending cap exceeded. Retry with backoff.
500server_error
internal_error
Unexpected server error. Retry; if persistent, contact support.
502server_error
upstream_error
Upstream provider returned an error or is unreachable.

Successful responses (2xx) include an X-Usage-Event-Id header, the ULID of the recorded usage event. Error responses do not carry this header.