Rate limits

How it works

Authenticated traffic is rate limited per organization using a fixed window. (Requests without an organization fall back to per-key, then per-IP.) Internal service traffic is not rate limited.

Headers

Every /v1 response reports the current budget:

x-ratelimit-limit: 600        # requests allowed per window
x-ratelimit-remaining: 412    # requests left in the current window
x-ratelimit-reset: 1718380800 # unix seconds when the window resets

Exceeding the limit

When the window is exhausted the gateway returns:

{
  "error": {
    "message": "rate limit exceeded",
    "type": "rate_limit_error"
  }
}

with status 429. Back off until the x-ratelimit-reset time, then retry.

Rate limiting is fail-open: if the limiter's backing store is briefly unavailable, requests are allowed through rather than rejected — availability is not gated on the limiter. (Authentication and billing, by contrast, are fail-closed.)

Rate limits

How it works

Headers

Exceeding the limit

Fail-open

On this page