Streaming and reliability
Use this page to understand streaming behavior and reliability guarantees in OngoingAI Gateway. It covers streaming metadata capture, backpressure handling, and shutdown semantics.
Reliability goals
- Preserves provider streaming response delivery semantics to clients.
- Detects server-sent event responses and captures stream metadata.
- Records stream chunk count and time to first token (TTFT).
- Uses asynchronous trace writes so proxy forwarding does not wait on storage.
- Applies bounded trace queue behavior with explicit drop signals under backpressure.
Operational fit
- You need low-latency streamed responses for client UX.
- You need predictable behavior when trace storage falls behind.
- You need operational visibility into stream timing and chunk behavior.
Stream and trace pipeline
- Proxy forwards provider traffic directly on matched provider routes.
- Streaming detection uses response
Content-Typecontainingtext/event-stream. - Capture middleware counts stream chunks and measures TTFT from handler start to first upstream write.
- TTFT is recorded as
time_to_first_token_msandtime_to_first_token_usin trace records. - Trace records enqueue to an asynchronous writer queue (buffer size
1024). - If the queue is full, the gateway logs
trace queue is full; dropping traceand continues proxy forwarding. - If persistence fails inside async writes, failures are logged as
trace persistence failed; dropped trace records. - On shutdown, the gateway tries to flush pending traces within a 5-second timeout and logs the flush outcome.
Streaming payload capture can be truncated by tracing.body_max_size for stored
trace bodies. Truncation affects stored trace content, not client stream
delivery.
Recommended baseline
No separate feature flag is required. Streaming behavior is available on normal provider routes.
Recommended baseline:
tracing:
capture_bodies: false
body_max_size: 1048576With capture_bodies=false, stream bodies are not stored, but stream TTFT and
chunk metadata are still captured.
Deployment patterns
- Lowest overhead streaming telemetry:
keep
capture_bodies=false. - Incident debugging window:
set
capture_bodies=truewith a reducedbody_max_size. - High-throughput environments: monitor logs for queue saturation and persistence failure signals.
Example checks
Validate TTFT and chunk metrics on an SSE endpoint
-
Start the gateway in Terminal A.
Bashongoingai config validate ongoingai serve -
Send a streaming request in Terminal B.
Bashcurl -N http://localhost:8080/openai/v1/chat/completions \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"gpt-4o-mini","stream":true,"messages":[{"role":"user","content":"Reply with ok"}]}'Placeholder:
OPENAI_API_KEY: Your upstream provider API key.
-
Query recent traces in Terminal B.
Bashcurl "http://localhost:8080/api/traces?limit=5"
You should see streamed response chunks in the client output and non-zero
time_to_first_token_ms for streamed traces.
Observe queue-pressure behavior safely in staging
Generate sustained proxy traffic while storage is constrained, then monitor gateway logs for:
trace queue is full; dropping tracetrace persistence failed; dropped trace records
Proxy request forwarding should continue while these warnings appear.
Validation checklist
-
Send one non-stream request and one stream request through provider routes.
-
Query trace summaries:
Bashcurl "http://localhost:8080/api/traces?limit=10" -
Query one stream trace detail:
Bashcurl "http://localhost:8080/api/traces/TRACE_ID"Placeholder:
TRACE_ID: ID of a streamed trace from/api/traces.
You should see:
- Stream traces with non-zero
time_to_first_token_ms. - Stream traces with
stream_chunksmetadata inmetadata. - Proxy responses continuing during trace-write warnings under load.
Troubleshooting
Stream traces show zero TTFT
- Symptom:
time_to_first_token_msis0for expected streaming traffic. - Cause: Upstream response was not SSE (
text/event-stream), or the request was not streamed. - Fix: Confirm provider request includes stream mode and upstream returns SSE.
Logs show trace queue is full; dropping trace
- Symptom: Trace drops appear during high traffic.
- Cause: Async trace queue reached capacity while storage writes lag.
- Fix: Reduce capture load, improve storage throughput, and monitor queue-drop frequency.
Logs show trace persistence failed; dropped trace records
- Symptom: Persistence failure logs appear with failed batch counts.
- Cause: Trace store writes are failing.
- Fix: Verify storage connectivity and health, then confirm failures stop.
Gateway shutdown logs trace flush timeout
- Symptom: Logs show failed flush before shutdown.
- Cause: Pending trace writes exceeded the 5-second shutdown window.
- Fix: Allow graceful shutdown time and verify storage responsiveness.
Proxy returns 502 upstream request failed
- Symptom: Stream or non-stream proxy requests return
502. - Cause: Upstream provider request failed before completion.
- Fix: Verify provider endpoint health and network reachability.