Troubleshooting
Use this page to diagnose common gateway failures quickly. Each issue uses the same format: symptom, cause, and fix.
Fast triage flow
Run the following checks in order before you debug individual endpoints.
-
Validate config before restart:
Bashongoingai config validate --config ongoingai.yaml -
Start the gateway and keep logs visible:
Bashongoingai serve --config ongoingai.yaml -
Verify service health:
Bashcurl -i "http://localhost:8080/api/health" -
Send one proxied provider request:
Bashcurl -i "http://localhost:8080/openai/v1/chat/completions" \ -H "Authorization: Bearer OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Reply with ok"}]}' -
Check trace capture:
Bashcurl -i "http://localhost:8080/api/traces?limit=1"
Startup failures
config is invalid: ...
- Symptom:
ongoingai serveexits immediately with a config validation message. - Cause: One or more config fields fail validation.
- Fix: Run
ongoingai config validate --config ongoingai.yaml, then correct the reported field.
failed to initialize sqlite storage: ...
- Symptom: Startup fails before the server begins listening.
- Cause: SQLite path is invalid or not writable.
- Fix: Set a writable
storage.pathand verify directory permissions.
failed to initialize postgres storage: ...
- Symptom: Startup fails with Postgres initialization errors.
- Cause:
storage.dsnis invalid, or database connectivity is unavailable. - Fix: Verify DSN format, network reachability, and database credentials.
Gateway fails to bind host and port
- Symptom: Startup logs show
gateway failedwith listen errors. - Cause: Configured host or port is unavailable.
- Fix: Free the port, or update
server.hostandserver.port.
Proxy and provider issues
Proxy returns 502 upstream request failed
- Symptom: Provider route responses return
502with bodyupstream request failed. - Cause: Upstream provider is unavailable or unreachable.
- Fix: Verify provider upstream URL, DNS, outbound network path, and provider service status.
Proxy returns 403 for missing provider credential
- Symptom: Error says provider API key is missing.
- Cause: Gateway auth passed, but request did not include provider key.
- Fix: Add
AuthorizationorX-API-Keyheader with provider token.
Proxy route returns 404 page not found
- Symptom: Request to expected provider route returns
404. - Cause: Request path does not match configured provider prefixes.
- Fix: Check
providers.openai.prefixandproviders.anthropic.prefix, then update client base URLs.
Authorization issues
Protected routes return 401 missing or invalid gateway key
- Symptom: API or proxy route rejects request with
401. - Cause: Gateway key is missing, invalid, or sent in the wrong header.
- Fix: Send key in configured
auth.header(defaultX-OngoingAI-Gateway-Key).
Protected routes return 403 gateway key does not have required permission
- Symptom: Request is authenticated but blocked by policy.
- Cause: Key role and permissions do not include required permission.
- Fix: Use a key with route-required permission
(
proxy:write,analytics:read, orkeys:manage).
Protected routes return 503 gateway key verification unavailable
- Symptom: Requests intermittently fail with key verification errors.
- Cause: Dynamic key resolver is unavailable or stale fail-closed behavior is active in Postgres mode.
- Fix: Restore config-store connectivity and verify key refresh logs.
Traces and analytics issues
/api/traces returns no items
- Symptom: Trace list is empty after testing.
- Cause: Request did not pass provider routes, or no successful provider traffic has been captured yet.
- Fix: Send traffic through
/openai/...or/anthropic/..., then query/api/traces?limit=10again.
/api/traces returns 400 for query filters
- Symptom: Trace list request fails with validation errors.
- Cause: Invalid filter values (for example
limit,status,min_tokens,max_tokens,from,to, orcursor). - Fix: Use supported ranges and time formats. Ensure
to >= fromandmax_tokens >= min_tokens.
/api/analytics/* returns 400 for series options
- Symptom: Usage or cost analytics request fails with query validation errors.
- Cause: Unsupported
group_byorbucket, or invalid date range. - Fix: Use
group_by=provider|modelandbucket=hour|day|week. Ensureto >= from.
/api/traces/:id returns 404 trace not found for known ID
- Symptom: Trace detail is missing for a trace seen by another caller.
- Cause: Tenant scoping hides traces outside caller
org_idandworkspace_id. - Fix: Query with a gateway key in the same tenant scope as the trace.
Gateway key management issues
Gateway key create, rotate, or revoke returns 501
- Symptom: Key lifecycle route returns not implemented.
- Cause: Active config store does not support key mutations.
- Fix: Use Postgres-backed key store for key lifecycle APIs.
Gateway key create or rotate returns 409
- Symptom: Key mutation fails with conflict.
- Cause: Key ID already exists, or rotated token conflicts with an existing key token.
- Fix: Use a unique ID or token, then retry.
Gateway key mutation returns 400 invalid json body
- Symptom: Create or rotate request fails with JSON body error.
- Cause: Request body is invalid JSON.
- Fix: Send valid JSON with
Content-Type: application/json.
Reliability and shutdown
Logs show trace queue is full; dropping trace
- Symptom: Warning logs appear under high traffic.
- Cause: Async trace queue is saturated while storage writes lag.
- Fix: Reduce capture load, improve storage throughput, and monitor warning frequency.
Logs show trace persistence failed; dropped trace records
- Symptom: Error logs report failed trace write batches.
- Cause: Trace store write failures in async writer.
- Fix: Verify storage health and connectivity, then confirm errors stop.
Logs show failed to flush pending traces before shutdown
- Symptom: Shutdown logs report flush timeout or cancellation.
- Cause: Pending trace writes exceeded shutdown flush window.
- Fix: Allow graceful shutdown time and verify storage responsiveness.