OpenTelemetry
Use this page to configure OpenTelemetry (OTEL) trace and metric export from OngoingAI Gateway to an OTLP-compatible collector, or to expose a native Prometheus scrape endpoint.
When to use this integration
- You run an observability stack that accepts OTLP data, such as Jaeger, Grafana Tempo, Datadog, Honeycomb, or a standalone OpenTelemetry Collector.
- You need distributed tracing across the gateway and upstream provider calls.
- You need gateway-level operational metrics for queue health, provider latency, and write reliability.
- You want tenant-scoped span attributes for multi-tenant observability filtering.
- You run Prometheus and want to scrape gateway metrics directly.
- You want end-to-end correlation across all three observability pillars: jump from a log line to its trace, or from a latency spike on a dashboard to the exact request trace that caused it.
How it works
When enabled, the gateway creates OTLP HTTP exporters at startup:
- A trace exporter that batches and sends spans to your collector.
- A metric exporter that periodically pushes counters, histograms, and gauges.
- A Prometheus scrape endpoint (
/metricsby default) that serves metrics in Prometheus exposition format whenprometheus_enabledistrue. - A credential scrubbing exporter that wraps the trace exporter and sanitizes all span attribute values before they leave the process, providing defense-in-depth against credential leaks in telemetry.
- Go runtime metrics (
go_memory_*,go_goroutine_*,go_gc_*,go_sched_*) are registered automatically for process-level health monitoring.
The gateway wraps its HTTP server and upstream transport with OpenTelemetry instrumentation. Each inbound request produces a server span, and each upstream proxy call produces a client span as a child of the server span. Dedicated child spans cover auth evaluation, provider routing, trace enqueue, and storage writes.
All three observability pillars are correlated end-to-end:
- Logs → Traces: Structured JSON logs automatically include
trace_idandspan_idfrom the active request span, so any log line can be joined to its distributed trace. - Metrics → Traces: Histogram exemplars on proxy and provider latency
metrics carry the
trace_idof the recorded request. In Grafana or any exemplar-aware dashboard, clicking a latency spike jumps directly to the trace that caused it.
Trace context propagates using the W3C Trace Context standard.
Configuration
YAML configuration
Add an observability.otel section to ongoingai.yaml:
observability:
otel:
enabled: true
endpoint: localhost:4318
insecure: true
service_name: ongoingai-gateway
traces_enabled: true
metrics_enabled: true
prometheus_enabled: false
prometheus_path: /metrics
sampling_ratio: 1.0
export_timeout_ms: 3000
metric_export_interval_ms: 10000Field reference
| Field | Type | Default | Notes |
|---|---|---|---|
observability.otel.enabled | bool | false | Master toggle. Set to true to activate OTEL export. |
observability.otel.endpoint | string | localhost:4318 | OTLP HTTP collector endpoint. Accepts host:port or a full URL. |
observability.otel.insecure | bool | true | Use plain HTTP. Set to false for HTTPS. |
observability.otel.service_name | string | ongoingai-gateway | Value for the service.name resource attribute. |
observability.otel.traces_enabled | bool | true | Enable trace span export. |
observability.otel.metrics_enabled | bool | true | Enable OTLP push metric export. |
observability.otel.prometheus_enabled | bool | false | Enable native Prometheus scrape endpoint. |
observability.otel.prometheus_path | string | /metrics | Path for the Prometheus scrape endpoint. Must start with / and must not overlap with /api, /openai, or /anthropic. |
observability.otel.sampling_ratio | float | 1.0 | Trace sampling ratio from 0.0 (none) to 1.0 (all). Uses parent-based sampling with trace ID ratio. |
observability.otel.export_timeout_ms | int | 3000 | Timeout in milliseconds for each export request to the collector. |
observability.otel.metric_export_interval_ms | int | 10000 | Interval in milliseconds between periodic metric exports. |
Environment variables
The gateway also accepts standard OpenTelemetry environment variables. These follow the same precedence as other env overrides: they apply after YAML values.
| Variable | Effect |
|---|---|
OTEL_SDK_DISABLED | Set to true to disable OTEL entirely. |
OTEL_EXPORTER_OTLP_ENDPOINT | OTLP collector endpoint. If set, OTEL is auto-enabled. |
OTEL_EXPORTER_OTLP_INSECURE | Set to true for plain HTTP transport. |
OTEL_SERVICE_NAME | Override service_name. |
OTEL_TRACES_EXPORTER | Set to otlp to enable traces, or none to disable. |
OTEL_METRICS_EXPORTER | Set to otlp to enable push metrics, prometheus to enable Prometheus scrape mode, or none to disable. |
OTEL_TRACES_SAMPLER_ARG | Sampling ratio as a float (for example, 0.5). |
OTEL_EXPORTER_OTLP_TIMEOUT | Export timeout in milliseconds. |
OTEL_METRIC_EXPORT_INTERVAL | Metric export interval in milliseconds. |
ONGOINGAI_PROMETHEUS_ENABLED | Set to true to enable Prometheus scrape endpoint. |
ONGOINGAI_PROMETHEUS_PATH | Override the Prometheus endpoint path (default /metrics). |
Setting OTEL_EXPORTER_OTLP_ENDPOINT to a non-empty value automatically
enables OTEL export, even if observability.otel.enabled is false in YAML.
Setting OTEL_METRICS_EXPORTER=prometheus enables Prometheus mode and
disables OTLP push metrics. This is equivalent to setting
prometheus_enabled: true and metrics_enabled: false.
Endpoint format
The endpoint field accepts two formats:
- Host and port:
localhost:4318orcollector.internal:4318. Theinsecurefield controls whether the gateway uses HTTP or HTTPS. - Full URL:
http://collector.internal:4318orhttps://collector.example.com:4318. The URL scheme overrides theinsecuresetting. Anhttp://scheme forces insecure mode, and anhttps://scheme forces secure mode.
Traces
Inbound request spans
The gateway creates a server span for each incoming HTTP request. The span name uses the route pattern:
| Request path | Span name |
|---|---|
/openai/... | POST /openai/* |
/anthropic/... | GET /anthropic/* |
/api/... | POST /api/* |
| Other paths | POST /other |
The HTTP method in the span name matches the actual request method.
Upstream proxy spans
Each request forwarded to an upstream provider creates a child client span. The
span name is prefixed with proxy:
| Request path | Span name |
|---|---|
/openai/... | proxy POST /openai/* |
/anthropic/... | proxy POST /anthropic/* |
Auth evaluation spans
The gateway.auth span wraps the auth middleware and records whether the
request was allowed or denied.
| Attribute | Description |
|---|---|
gateway.auth.result | allow or deny. |
gateway.auth.deny_reason | unauthorized or forbidden (only set on deny). |
On deny, the span status is set to Error with the deny reason.
Route spans
The gateway.route span wraps provider routing and records the matched
provider and route prefix.
| Attribute | Description |
|---|---|
gateway.route.provider | Matched provider: openai, anthropic, or unknown. |
gateway.route.prefix | Matched route prefix: /openai, /anthropic, or /. |
gateway.org_id | Organization ID from the authenticated gateway key. |
gateway.workspace_id | Workspace ID from the authenticated gateway key. |
The span status is set to Error for HTTP 5xx responses.
Trace enqueue spans
The gateway.trace.enqueue span records whether a trace was accepted into the
async write queue or dropped due to backpressure.
| Attribute | Description |
|---|---|
gateway.trace.enqueue.result | accepted or dropped. |
gateway.org_id | Organization ID from the authenticated gateway key. |
gateway.workspace_id | Workspace ID from the authenticated gateway key. |
On drop, the span status is set to Error with message trace dropped.
Trace write spans
The gateway.trace.write span records each storage write batch from the async
trace writer.
| Attribute | Description |
|---|---|
gateway.trace.write.batch_size | Number of traces in the write batch. |
gateway.trace.write.error_class | Error classification on failure (credential-scrubbed). |
On write failure, the span status is set to Error with message
write failed. The error_class value is sanitized by the credential
scrubbing layer to prevent credential leakage in error messages.
Gateway attributes
After authentication completes, the span enrichment middleware adds tenant identity attributes to the active server span:
| Attribute | Description |
|---|---|
gateway.correlation_id | Correlation ID linking logs, spans, and traces. |
gateway.org_id | Organization ID from the authenticated gateway key. |
gateway.workspace_id | Workspace ID from the authenticated gateway key. |
gateway.key_id | Gateway API key identifier. |
gateway.role | Role assigned to the gateway key. |
These attributes are only added when auth.enabled=true and the request
authenticates successfully.
Error handling
The gateway sets span status to Error for HTTP 5xx responses from upstream
providers. The status message includes the HTTP status code, such as http 502.
HTTP 4xx responses do not set error status on the span.
Resource attributes
All spans and metrics include the following resource attributes:
| Attribute | Value |
|---|---|
service.name | Value of observability.otel.service_name. |
service.version | Gateway binary version. |
Observability correlation
The gateway connects all three observability pillars so you can move between logs, traces, and metrics without manual ID lookups.
Logs to traces
Every structured JSON log line emitted during an active request span includes
trace_id and span_id fields:
{
"time": "2025-01-15T10:30:00Z",
"level": "INFO",
"msg": "captured exchange",
"trace_id": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4",
"span_id": "1a2b3c4d5e6f7a8b",
"correlation_id": "corr-abc-123",
"path": "/openai/v1/chat/completions",
"status": 200
}This is implemented via the TraceLogHandler slog wrapper, which injects
trace context from the active span into every log record. Use these fields
to join log lines to their distributed trace in Grafana Loki, Elasticsearch,
Datadog, or any log backend that supports trace correlation.
Metrics to traces (exemplars)
The proxy and provider latency histograms
(ongoingai.proxy.request_duration_seconds and
ongoingai.provider.request_duration_seconds) attach exemplars with the
trace_id and span_id of the request that produced each measurement.
In practice, this means:
- A p99 latency spike on a Grafana dashboard has a clickable exemplar dot that opens the exact trace responsible.
- Prometheus stores exemplars alongside histogram buckets when
--enable-feature=exemplar-storageis active. - Grafana Tempo, Jaeger, and other trace backends can receive the jump from the exemplar link.
Exemplars are enabled automatically. They fire whenever the request context carries a sampled span, so sampling controls exemplar volume with no additional configuration.
Correlation summary
| From | To | Mechanism |
|---|---|---|
| Log line | Trace | trace_id and span_id in structured JSON logs |
| Metric data point | Trace | Histogram exemplar with trace_id and span_id |
| Trace span | Logs | Filter logs by trace_id in your log backend |
| Trace span | Metrics | Span attributes match metric label dimensions |
Metrics
The gateway exports 12 metric instruments organized in three groups.
Trace pipeline metrics
ongoingai.trace.queue_dropped_total
Type: Int64Counter
Counts trace records dropped because the async trace queue was full.
| Attribute | Description |
|---|---|
provider | Provider name (openai, anthropic). |
model | Model name from the request (unknown when unavailable). |
org_id | Organization ID. |
workspace_id | Workspace ID. |
route | Route pattern (/openai/*, /anthropic/*, /api/*, /other). |
status_code | HTTP response status code. |
ongoingai.trace.write_failed_total
Type: Int64Counter
Counts trace records dropped after a storage write failure.
| Attribute | Description |
|---|---|
operation | Write operation that failed (write_trace, write_batch_fallback). |
error_class | Classified failure: connection, timeout, contention, constraint, or unknown. |
store | Trace storage backend (sqlite, postgres). |
ongoingai.trace.enqueued_total
Type: Int64Counter
Counts traces successfully enqueued to the async write queue. No attributes.
ongoingai.trace.written_total
Type: Int64Counter
Counts traces successfully persisted to storage. No attributes.
ongoingai.trace.flush_duration_seconds
Type: Float64Histogram Unit: seconds
Time to flush a batch of traces to storage. Uses custom bucket boundaries optimized for fast database writes: 1ms, 2.5ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s. No attributes.
ongoingai.trace.flush_batch_size
Type: Int64Histogram
Number of traces per flush batch. No attributes.
ongoingai.trace.queue_depth
Type: Int64ObservableGauge
Current number of traces waiting in the async write queue. Sampled each collection cycle. No attributes.
ongoingai.trace.queue_capacity
Type: Int64ObservableGauge
Capacity of the async trace write queue. Sampled each collection cycle. No attributes.
Provider metrics
ongoingai.provider.request_total
Type: Int64Counter
Counts upstream provider requests.
| Attribute | Description |
|---|---|
provider | Provider name (openai, anthropic). |
model | Model name from the request. |
org_id | Organization ID (unknown when unavailable). |
workspace_id | Workspace ID (unknown when unavailable). |
route | Route pattern (/openai/*, /anthropic/*, /api/*, /other). |
status_code | HTTP response status code from the provider. |
ongoingai.provider.request_duration_seconds
Type: Float64Histogram Unit: seconds
Upstream provider request duration. Uses custom bucket boundaries optimized
for AI API response times: 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms,
1s, 2.5s, 5s, 10s. Exemplars are attached with the trace_id and span_id
of the recorded request.
| Attribute | Description |
|---|---|
provider | Provider name (openai, anthropic). |
model | Model name from the request. |
org_id | Organization ID (unknown when unavailable). |
workspace_id | Workspace ID (unknown when unavailable). |
route | Route pattern (/openai/*, /anthropic/*, /api/*, /other). |
Proxy metrics
ongoingai.proxy.request_total
Type: Int64Counter
Counts proxy requests with tenant scoping.
| Attribute | Description |
|---|---|
provider | Provider name (openai, anthropic). |
model | Model name from the request (unknown when unavailable). |
org_id | Organization ID. |
workspace_id | Workspace ID. |
route | Route pattern (/openai/*, /anthropic/*, /api/*, /other). |
status_code | HTTP response status code. |
ongoingai.proxy.request_duration_seconds
Type: Float64Histogram Unit: seconds
Proxy request duration with tenant scoping. Uses the same custom bucket
boundaries as the provider histogram (5ms to 10s). Exemplars are attached
with the trace_id and span_id of the recorded request.
| Attribute | Description |
|---|---|
provider | Provider name (openai, anthropic). |
model | Model name from the request (unknown when unavailable). |
org_id | Organization ID. |
workspace_id | Workspace ID. |
route | Route pattern (/openai/*, /anthropic/*, /api/*, /other). |
Prometheus
The gateway can expose a native Prometheus scrape endpoint that serves all 12 metric instruments in Prometheus exposition format. This is an alternative to OTLP push metrics for teams that run Prometheus-based monitoring.
Configuration
Enable Prometheus in YAML:
observability:
otel:
enabled: true
service_name: ongoingai-gateway
traces_enabled: false
metrics_enabled: false
prometheus_enabled: true
prometheus_path: /metricsOr with environment variables:
ONGOINGAI_PROMETHEUS_ENABLED=true \
ONGOINGAI_PROMETHEUS_PATH=/metrics \
ongoingai serve --config ongoingai.yamlSetting OTEL_METRICS_EXPORTER=prometheus also enables Prometheus mode and
disables OTLP push metrics.
Verify the endpoint
After starting the gateway with Prometheus enabled:
curl http://localhost:8080/metricsYou should see Prometheus exposition format output with ongoingai_ prefixed
metric names.
Grafana / Prometheus scrape config
Add the gateway as a scrape target in your Prometheus configuration:
scrape_configs:
- job_name: ongoingai-gateway
scrape_interval: 15s
static_configs:
- targets: ["localhost:8080"]
metrics_path: /metricsUsing both Prometheus and OTLP push
You can enable both Prometheus scrape and OTLP push metrics simultaneously:
observability:
otel:
enabled: true
endpoint: localhost:4318
insecure: true
service_name: ongoingai-gateway
traces_enabled: true
metrics_enabled: true
prometheus_enabled: trueBoth exporters read from the same meter provider, so metric values are consistent across both surfaces.
Go runtime metrics
The gateway automatically registers Go runtime metrics when metrics export is enabled (OTLP push or Prometheus). These appear alongside gateway metrics and include:
go_memory_classes_heap_objects_bytes— heap memory in usego_goroutine_count— active goroutine countgo_gc_duration_seconds— GC pause durationsgo_sched_goroutines_goroutines— scheduler goroutine count
These are useful for monitoring gateway process health and capacity planning.
Enabling exemplar storage in Prometheus
To use histogram exemplars for metrics-to-traces correlation, enable exemplar storage in Prometheus:
# Start Prometheus with --enable-feature=exemplar-storage
# or add to your Prometheus config:
global:
scrape_interval: 15s
scrape_configs:
- job_name: ongoingai-gateway
scrape_interval: 15s
static_configs:
- targets: ["localhost:8080"]
metrics_path: /metricsIn Grafana, exemplars appear as dots on histogram panels. Clicking an exemplar dot opens the linked trace in your configured trace data source (Tempo, Jaeger, etc.).
Credential scrubbing
The gateway applies defense-in-depth credential scrubbing to all telemetry exports.
How it works
A scrubbing exporter wraps the OTLP trace exporter and sanitizes all string attribute values before they leave the process. The scrubbing runs in the async batch export goroutine, not on the request hot path.
The MakeWriteSpanHook also sanitizes error messages recorded in
gateway.trace.write spans via ScrubCredentials.
Patterns detected
| Pattern | Examples |
|---|---|
| Token prefixes | sk_..., pk_..., rk_..., xoxb_..., ghp_..., pat_... |
| JWTs | eyJ... (three dot-separated base64url segments) |
| Bearer tokens | Bearer <token> in header-like strings |
| Connection string secrets | password=..., secret=..., token=... |
All detected patterns are replaced with [CREDENTIAL_REDACTED].
Safety guarantees
- Metric label values are trimmed and credential-scrubbed before export. Any
detected credential pattern is replaced with
[CREDENTIAL_REDACTED]. - Missing request-scope label values are emitted as
unknownto preserve a stable metric schema. - Span attributes that could carry credential data (error messages, status descriptions) are scrubbed before export.
- Clean spans with no credential patterns pass through with zero allocation overhead.
Alerting recommendations
These PromQL examples target common gateway failure modes. Adjust thresholds for your traffic volume and SLO targets.
Trace queue drops
Alert when traces are being dropped due to queue backpressure:
increase(ongoingai_trace_queue_dropped_total[5m]) > 0Any nonzero value indicates trace data loss. Investigate storage throughput and connectivity.
Trace write failures
Alert when storage writes are failing:
increase(ongoingai_trace_write_failed_total[5m]) > 0Check error_class label for failure classification (connection, timeout,
contention, constraint).
Queue saturation
Alert when the trace queue is near capacity:
ongoingai_trace_queue_depth / ongoingai_trace_queue_capacity > 0.9Sustained high saturation precedes queue drops. Scale storage throughput or reduce capture load.
Provider error rate
Alert on elevated provider error rates:
sum(rate(ongoingai_provider_request_total{status_code=~"5.."}[5m]))
/
sum(rate(ongoingai_provider_request_total[5m])) > 0.05A 5% error rate threshold is a reasonable starting point. Break down by
provider and model labels to isolate the source.
High proxy latency
Alert on elevated proxy latency:
histogram_quantile(0.99,
rate(ongoingai_proxy_request_duration_seconds_bucket[5m])
) > 10Adjust the quantile and threshold to match your latency SLO.
Shutdown behavior
On SIGINT or SIGTERM, the gateway flushes pending telemetry data before
exiting:
- The HTTP server stops accepting new connections and completes in-flight requests (5-second timeout).
- The trace writer drains its queue and flushes remaining trace records to storage (5-second timeout).
- The OpenTelemetry trace provider flushes buffered spans to the collector (5-second timeout).
- The OpenTelemetry metric provider flushes buffered metrics to the collector (5-second timeout).
If any flush step exceeds its timeout, the gateway logs an error and continues with the remaining shutdown steps.
Example configurations
Local development with Jaeger
Start Jaeger with OTLP ingestion:
docker run -d --name jaeger \
-p 16686:16686 \
-p 4318:4318 \
jaegertracing/all-in-one:latestConfigure the gateway to export to Jaeger:
observability:
otel:
enabled: true
endpoint: localhost:4318
insecure: true
service_name: ongoingai-gateway
traces_enabled: true
metrics_enabled: false
sampling_ratio: 1.0After sending traffic through the gateway, open http://localhost:16686 to view
traces in the Jaeger UI.
Production with an OTLP collector
Configure the gateway to export to a remote collector over HTTPS:
observability:
otel:
enabled: true
endpoint: https://otel-collector.internal:4318
service_name: ongoingai-gateway
traces_enabled: true
metrics_enabled: true
sampling_ratio: 0.1
export_timeout_ms: 5000
metric_export_interval_ms: 30000In production, consider reducing sampling_ratio to control trace volume. A
ratio of 0.1 samples 10% of requests. Parent-based sampling ensures that if
an incoming request already carries a sampled trace context, the gateway
respects that decision regardless of the local ratio.
Prometheus-only mode
Export metrics via Prometheus without an OTLP collector:
observability:
otel:
enabled: true
service_name: ongoingai-gateway
traces_enabled: false
metrics_enabled: false
prometheus_enabled: true
prometheus_path: /metricsEnvironment variable quickstart
Enable OTEL export without modifying YAML:
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
OTEL_SERVICE_NAME=ongoingai-gateway \
ongoingai serve --config ongoingai.yamlEnable Prometheus scrape via env vars:
OTEL_METRICS_EXPORTER=prometheus \
ongoingai serve --config ongoingai.yamlValidation checklist
-
Verify that your OTLP collector is reachable from the gateway host:
Bashcurl -s -o /dev/null -w "%{http_code}" http://localhost:4318/v1/tracesA
405or200response confirms the collector is listening. -
Start the gateway with OTEL enabled:
Bashongoingai serve --config ongoingai.yaml -
Send a proxied request through the gateway:
Bashcurl http://localhost:8080/openai/v1/chat/completions \ -H "Authorization: Bearer OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "hello"}]}'Placeholder:
OPENAI_API_KEY: Your OpenAI API key.
-
Check your collector or tracing UI for spans with service name
ongoingai-gateway.
You should see at least two spans: one for the inbound request and one for the
upstream proxy call. With auth enabled, you will also see gateway.auth and
gateway.route child spans.
Troubleshooting
Spans do not appear in the collector
- Symptom: The tracing UI shows no data for
ongoingai-gateway. - Cause: OTEL is not enabled, or the collector endpoint is unreachable.
- Fix: Verify that
observability.otel.enabledistrueand that theendpointvalue is reachable from the gateway. Check gateway logs for export timeout errors.
Gateway attributes are missing from spans
- Symptom: Spans appear but lack
gateway.org_id,gateway.workspace_id, and other tenant attributes. - Cause: Gateway auth is not enabled, or the request did not include a valid gateway key.
- Fix: Set
auth.enabled=trueand include a valid gateway key in the request header.
Sampling drops more traces than expected
- Symptom: Only a fraction of requests produce spans.
- Cause:
sampling_ratiois set below1.0. - Fix: Increase
sampling_ratiotoward1.0for higher coverage. A value of1.0samples all requests.
Export timeout errors in gateway logs
- Symptom: Gateway logs contain export timeout errors on shutdown or during operation.
- Cause: The collector is slow to respond, or
export_timeout_msis too low for your network. - Fix: Increase
export_timeout_msor verify collector performance.
Metrics are not exported
- Symptom: Traces appear in the collector but metrics do not.
- Cause:
metrics_enabledisfalse, or the collector does not accept OTLP metrics on the configured endpoint. - Fix: Set
metrics_enabledtotrueand verify that the collector supports OTLP metric ingestion on the same endpoint. Alternatively, enableprometheus_enabledto scrape metrics directly.
Prometheus /metrics returns 404
- Symptom:
curl http://localhost:8080/metricsreturns404. - Cause:
prometheus_enabledis not set totrue, orprometheus_pathdoes not match the request path. - Fix: Set
observability.otel.prometheus_enabled: truein YAML orONGOINGAI_PROMETHEUS_ENABLED=trueas an env var. Verify thatprometheus_pathmatches the path you are requesting.
Credential patterns appear in spans
- Symptom: Span attributes contain API keys or tokens.
- Cause: This should not happen when the scrubbing exporter is active. The scrubbing exporter is automatically enabled when traces are enabled.
- Fix: Verify that
traces_enabled: trueis set. If you see credential material in spans despite this, file a bug report.
Config validation fails with OTEL settings
- Symptom:
ongoingai config validaterejects the OTEL configuration. - Cause: A required field is empty or a numeric value is out of range.
- Fix: Verify that
endpointandservice_nameare non-empty, thatsampling_ratiois between0.0and1.0, and that timeout values are positive integers. Whenprometheus_enabledistrue, verify thatprometheus_pathstarts with/and does not overlap with/api,/openai, or/anthropic.