Rate limits
Per-organization rate limiting and the headers it sets.
How it works
Authenticated traffic is rate limited per organization using a fixed window. (Requests without an organization fall back to per-key, then per-IP.) Internal service traffic is not rate limited.
Headers
Every /v1 response reports the current budget:
x-ratelimit-limit: 600 # requests allowed per window
x-ratelimit-remaining: 412 # requests left in the current window
x-ratelimit-reset: 1718380800 # unix seconds when the window resetsExceeding the limit
When the window is exhausted the gateway returns:
{
"error": {
"message": "rate limit exceeded",
"type": "rate_limit_error"
}
}with status 429. Back off until the x-ratelimit-reset time, then retry.
Fail-open
Rate limiting is fail-open: if the limiter's backing store is briefly unavailable, requests are allowed through rather than rejected — availability is not gated on the limiter. (Authentication and billing, by contrast, are fail-closed.)