Rate limiting and quotas
Use this page to configure request and usage guardrails for provider traffic in OngoingAI Gateway. It covers enforcement scope, limit codes, and verification steps.
Enforcement scope
- Enforces per-key and per-workspace request rate limits.
- Enforces per-key and per-workspace daily token quotas.
- Enforces per-key and per-workspace daily cost quotas.
- Returns
429with structured limit codes when a limit is exceeded. - Sets
Retry-Afterfor request-rate limit responses.
Operational fit
- You need spend control for shared model usage.
- You need burst protection against abusive request patterns.
- You need fair-share protection across workspaces and keys.
Evaluation order
- Limits run on proxied provider routes (
/openai/*and/anthropic/*). - Limits use authenticated gateway identity (
org_id,workspace_id, key ID) for scope. - Daily quotas are checked first from persisted trace analytics for the current UTC day.
- Request-rate limits are then checked with in-memory sliding one-minute windows.
- If a limit is exceeded, middleware returns
429with an error code and message. - For request-rate limits, responses include
retry_after_secondsandRetry-After.
Limit codes:
KEY_RATE_LIMIT_EXCEEDEDWORKSPACE_RATE_LIMIT_EXCEEDEDKEY_DAILY_TOKENS_EXCEEDEDWORKSPACE_DAILY_TOKENS_EXCEEDEDKEY_DAILY_COST_EXCEEDEDWORKSPACE_DAILY_COST_EXCEEDED
Limits are enforced only when gateway auth is enabled because limiter scope depends on authenticated identity.
Starter limits config
YAML
auth:
enabled: true
header: X-OngoingAI-Gateway-Key
limits:
per_key:
requests_per_minute: 120
max_tokens_per_day: 1000000
max_cost_usd_per_day: 50
per_workspace:
requests_per_minute: 500
max_tokens_per_day: 5000000
max_cost_usd_per_day: 200Set limit values greater than 0 to enable each threshold.
A value of 0 disables that threshold.
Policy patterns
- Start with request-rate limits only, then add daily quotas after usage baselining.
- Use stricter per-key limits with a higher per-workspace envelope.
- Use workspace daily cost limits as a team budget guardrail.
- Keep per-key IDs stable so key-scoped daily quotas can attribute usage correctly.
Example limit profiles
Strict per-key burst control with workspace headroom
YAML
auth:
enabled: true
header: X-OngoingAI-Gateway-Key
limits:
per_key:
requests_per_minute: 30
max_tokens_per_day: 0
max_cost_usd_per_day: 0
per_workspace:
requests_per_minute: 300
max_tokens_per_day: 0
max_cost_usd_per_day: 0Daily budget ceiling for shared workspace usage
YAML
auth:
enabled: true
header: X-OngoingAI-Gateway-Key
limits:
per_key:
requests_per_minute: 120
max_tokens_per_day: 1000000
max_cost_usd_per_day: 25
per_workspace:
requests_per_minute: 600
max_tokens_per_day: 8000000
max_cost_usd_per_day: 200Verification steps
-
Configure a low request-rate threshold for test traffic.
YAMLauth: enabled: true header: X-OngoingAI-Gateway-Key limits: per_key: requests_per_minute: 2 max_tokens_per_day: 0 max_cost_usd_per_day: 0 -
Start the gateway in Terminal A.
Bashongoingai config validate ongoingai serve -
Send three provider requests quickly from Terminal B.
Bashfor i in 1 2 3; do curl -i "http://localhost:8080/openai/v1/models" \ -H "X-OngoingAI-Gateway-Key: GATEWAY_KEY" \ -H "Authorization: Bearer $OPENAI_API_KEY" done
Placeholders:
GATEWAY_KEY: Gateway key token withproxy:write.OPENAI_API_KEY: Upstream provider API key.
You should see:
- First requests pass through to provider behavior.
- A
429response once the per-key rate threshold is exceeded. - Response body fields
error,code, andretry_after_seconds. Retry-Afterresponse header for request-rate limits.
Troubleshooting
Limits do not trigger
- Symptom: High request volume never returns
429. - Cause:
auth.enabled=false, limits are zero, or traffic bypasses provider routes. - Fix: Enable auth, set limits above zero, and route traffic through
/openai/*or/anthropic/*.
429 returns workspace limit code unexpectedly
- Symptom: Response code is
WORKSPACE_*when per-key limits look low. - Cause: Workspace thresholds are checked independently and may be lower than aggregate key traffic.
- Fix: Raise workspace thresholds or redistribute key traffic by workspace.
429 responses are inconsistent across multiple gateway instances
- Symptom: Rate limits trigger differently per request path through load balancers.
- Cause: Request-rate counters are in-memory per gateway process.
- Fix: Treat current rate limiting as per-instance, or run a single gateway instance for strict global RPM behavior.
Daily quotas lag behind in-flight traffic
- Symptom: Daily token or cost limits trigger after a short delay.
- Cause: Daily quotas are based on persisted trace data, and trace writes are asynchronous.
- Fix: Expect short lag under burst load, and set conservative headroom in quota thresholds.
Proxy responses return 503 for usage limit checks
- Symptom: Response error is
gateway usage limit check unavailable. - Cause: Trace store analytics query failed during limit evaluation.
- Fix: Restore trace storage health and verify storage connectivity.