Skip to content

feat: OpenTelemetry OTLP trace export from MCP Gateway and proxy#3178

Merged
lpcox merged 7 commits intomainfrom
copilot/feat-opentelemetry-otlp-export
Apr 4, 2026
Merged

feat: OpenTelemetry OTLP trace export from MCP Gateway and proxy#3178
lpcox merged 7 commits intomainfrom
copilot/feat-opentelemetry-otlp-export

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 4, 2026

The MCP Gateway has no structured observability into per-tool-call latency, guard decisions, or backend failures. This adds OpenTelemetry OTLP trace export across both the MCP Gateway server mode and the awmg proxy subcommand, giving operators visual trace timelines across any OTEL-compatible backend (Jaeger, Honeycomb, Grafana Tempo, Datadog, etc.).

Changes

Config (internal/config/config_tracing.go, config_core.go)

  • New TracingConfig struct added to GatewayConfig under [gateway.tracing]
  • Fields: endpoint, service_name, sample_rate (0.0–1.0, default 1.0)
[gateway.tracing]
endpoint = "http://localhost:4318"
service_name = "mcp-gateway"
sample_rate = 1.0

Tracer provider (internal/tracing/provider.go, internal/tracing/http.go)

  • InitProvider() returns a noop provider (zero overhead) when no endpoint is configured, or a real OTLP/HTTP SDK provider otherwise
  • Respects standard env vars: OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_SERVICE_NAME
  • Sampler selection: AlwaysSample / NeverSample / TraceIDRatioBased based on sample_rate
  • Registered as global OTEL provider
  • W3C TraceContext + Baggage propagator registered globally (both noop and SDK paths) so that incoming traceparent/tracestate headers are always honoured
  • Shared WrapHTTPHandler helper usable by both the gateway and proxy packages

HTTP middleware (internal/server/http_helpers.go)

  • New WithOTELTracing() middleware adds a gateway.request span per request with http.method, http.path, and session.id attributes
  • Inserted as the outermost layer in wrapWithMiddleware()
  • Extracts incoming W3C traceparent/tracestate headers to continue an agent-originated trace; falls back to a fresh gateway-generated trace ID when no headers are present

Tool call instrumentation (internal/server/unified.go)

  • gateway.tool_call span covers the full DIFC enforcement lifecycle (phases 0–6) with tool.name + server.id attributes
  • gateway.backend.execute child span isolates backend MCP JSON-RPC latency with error recording on failure

CLI flags (internal/cmd/flags_tracing.go, root.go)

  • --otlp-endpoint, --otlp-service-name, --otlp-sample-rate flags; override config file values
  • Tracer provider initialized before server start with graceful 5s flush on shutdown

GitHub API proxy tracing (internal/proxy/, internal/cmd/proxy.go)

  • proxy.request span wraps every proxied request via shared WrapHTTPHandler
  • Extracts incoming W3C traceparent/tracestate headers to continue an agent-originated trace; falls back to a fresh gateway-generated trace ID when no headers are present
  • proxy.difc_pipeline span covers all 6 DIFC enforcement phases (label_resource → evaluate → forward → label_response → filter → propagate)
  • proxy.backend.forward child span isolates upstream GitHub API round-trip latency with HTTP status code and error recording
  • awmg proxy subcommand gains --otlp-endpoint, --otlp-service-name, --otlp-sample-rate flags and initializes the tracer provider with graceful shutdown

Resulting trace structures

Gateway mode:

gateway.request  (HTTP handler)
  └─ gateway.tool_call  {tool.name, server.id}
       └─ gateway.backend.execute  ← MCP JSON-RPC round-trip

Proxy mode (awmg proxy):

proxy.request  (HTTP handler)
  └─ proxy.difc_pipeline  {tool.name, http.path}
       └─ proxy.backend.forward  ← upstream GitHub API round-trip

Distributed tracing (agent → gateway):

When an agent sends a traceparent header the gateway span is parented to the agent's span, producing a single end-to-end trace. Without that header a self-contained root span with a fresh trace ID is generated automatically.

Copilot AI linked an issue Apr 4, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Add OpenTelemetry OTLP trace export from MCP Gateway feat: OpenTelemetry OTLP trace export from MCP Gateway Apr 4, 2026
Copilot AI requested a review from lpcox April 4, 2026 16:01
Copilot finished work on behalf of lpcox April 4, 2026 16:01
@lpcox lpcox marked this pull request as ready for review April 4, 2026 16:19
Copilot AI review requested due to automatic review settings April 4, 2026 16:19
@lpcox
Copy link
Copy Markdown
Collaborator

lpcox commented Apr 4, 2026

@copilot take a look at this comment and identify any opportunities/requirements for integration outside of the gateway github/gh-aw#24373 (comment)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenTelemetry tracing to MCP Gateway with OTLP/HTTP export so operators can visualize end-to-end request and tool-call latency, guard decisions, and backend failures in standard OTEL backends.

Changes:

  • Introduces tracing configuration ([gateway.tracing]) and CLI flags to control OTLP endpoint, service name, and sampling.
  • Adds an OTLP tracer provider initializer that installs a noop provider when tracing is disabled.
  • Instruments HTTP requests and backend tool calls with spans and basic attributes.
Show a summary per file
File Description
internal/tracing/provider.go Initializes global OTEL tracer provider (noop vs OTLP/HTTP), resolves config/env, sets sampler.
internal/tracing/provider_test.go Adds basic tests around provider initialization and tracer access.
internal/server/http_helpers.go Adds WithOTELTracing middleware and makes it outermost in the middleware chain.
internal/server/unified.go Adds spans for tool-call lifecycle and backend execution latency/errors.
internal/config/config_tracing.go Defines TracingConfig + defaults registration.
internal/config/config_core.go Adds Tracing *TracingConfig to GatewayConfig.
internal/cmd/flags_tracing.go Adds --otlp-* flags and env-based defaults.
internal/cmd/root.go Merges tracing flags into config, initializes provider, and flushes on shutdown.
go.mod / go.sum Adds OpenTelemetry dependencies and bumps some indirect deps.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 9/10 changed files
  • Comments generated: 7

Comment on lines +42 to +50
RegisterDefaults(func(cfg *Config) {
if cfg.Gateway != nil && cfg.Gateway.Tracing != nil {
if cfg.Gateway.Tracing.ServiceName == "" {
cfg.Gateway.Tracing.ServiceName = DefaultTracingServiceName
}
if cfg.Gateway.Tracing.SampleRate == 0 {
cfg.Gateway.Tracing.SampleRate = DefaultTracingSampleRate
}
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults setter forces SampleRate to 1.0 when the decoded value is 0, which prevents users from configuring sample_rate = 0.0 in TOML (documented as valid for “no sampling”). To support an explicit 0.0 value, consider representing SampleRate as a pointer (so “unset” can be distinguished from 0.0) or using a separate “set” flag during decode/defaulting.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +36
"OTLP HTTP endpoint for trace export (e.g. http://localhost:4318). Overrides OTEL_EXPORTER_OTLP_ENDPOINT env var. Tracing is disabled when empty.")
cmd.Flags().StringVar(&otlpServiceName, "otlp-service-name", getDefaultOTLPServiceName(),
"Service name reported in traces. Overrides OTEL_SERVICE_NAME env var.")
cmd.Flags().Float64Var(&otlpSampleRate, "otlp-sample-rate", config.DefaultTracingSampleRate,
"Fraction of traces to sample and export (0.0–1.0). Default 1.0 samples everything.")
})
}

// getDefaultOTLPEndpoint returns the OTLP endpoint, checking OTEL_EXPORTER_OTLP_ENDPOINT
// environment variable first, then falling back to empty (disabled).
func getDefaultOTLPEndpoint() string {
return envutil.GetEnvString("OTEL_EXPORTER_OTLP_ENDPOINT", "")
}

// getDefaultOTLPServiceName returns the OTLP service name, checking OTEL_SERVICE_NAME
// environment variable first, then falling back to the default.
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag help text says --otlp-endpoint/--otlp-service-name override the corresponding OTEL env vars, but tracing.resolveEndpoint/resolveServiceName currently prefer OTEL_EXPORTER_OTLP_ENDPOINT/OTEL_SERVICE_NAME over config/flags. This means operators cannot override (or disable) an env-configured endpoint via CLI, which is surprising and contradicts the help. Consider changing precedence to CLI > env > config (e.g., resolve envs in cmd layer and pass resolved values to InitProvider), or update the help text to match actual precedence.

Suggested change
"OTLP HTTP endpoint for trace export (e.g. http://localhost:4318). Overrides OTEL_EXPORTER_OTLP_ENDPOINT env var. Tracing is disabled when empty.")
cmd.Flags().StringVar(&otlpServiceName, "otlp-service-name", getDefaultOTLPServiceName(),
"Service name reported in traces. Overrides OTEL_SERVICE_NAME env var.")
cmd.Flags().Float64Var(&otlpSampleRate, "otlp-sample-rate", config.DefaultTracingSampleRate,
"Fraction of traces to sample and export (0.0–1.0). Default 1.0 samples everything.")
})
}
// getDefaultOTLPEndpoint returns the OTLP endpoint, checking OTEL_EXPORTER_OTLP_ENDPOINT
// environment variable first, then falling back to empty (disabled).
func getDefaultOTLPEndpoint() string {
return envutil.GetEnvString("OTEL_EXPORTER_OTLP_ENDPOINT", "")
}
// getDefaultOTLPServiceName returns the OTLP service name, checking OTEL_SERVICE_NAME
// environment variable first, then falling back to the default.
"OTLP HTTP endpoint for trace export (e.g. http://localhost:4318). Defaults from OTEL_EXPORTER_OTLP_ENDPOINT when set. Tracing is disabled when empty.")
cmd.Flags().StringVar(&otlpServiceName, "otlp-service-name", getDefaultOTLPServiceName(),
"Service name reported in traces. Defaults from OTEL_SERVICE_NAME when set.")
cmd.Flags().Float64Var(&otlpSampleRate, "otlp-sample-rate", config.DefaultTracingSampleRate,
"Fraction of traces to sample and export (0.0–1.0). Default 1.0 samples everything.")
})
}
// getDefaultOTLPEndpoint returns the default value for the OTLP endpoint flag,
// using OTEL_EXPORTER_OTLP_ENDPOINT when set and otherwise falling back to empty
// (disabled).
func getDefaultOTLPEndpoint() string {
return envutil.GetEnvString("OTEL_EXPORTER_OTLP_ENDPOINT", "")
}
// getDefaultOTLPServiceName returns the default value for the OTLP service name
// flag, using OTEL_SERVICE_NAME when set and otherwise falling back to the
// configured default.

Copilot uses AI. Check for mistakes.
Comment on lines +204 to +208
// Add session ID to span after context is available
sessionID := SessionIDFromContext(ctx)
span.SetAttributes(attribute.String("session.id", auth.TruncateSessionID(sessionID)))

next.ServeHTTP(w, r.WithContext(ctx))
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithOTELTracing sets the session.id attribute before calling next.ServeHTTP. In this server, the session ID is injected into the request context inside the StreamableHTTP callback (setupSessionCallback), which happens during next.ServeHTTP, so this will typically record session.id=default rather than the real session. Consider capturing the request pointer you pass to next (e.g., req := r.WithContext(ctx)), calling next.ServeHTTP(w, req), then reading SessionIDFromContext(req.Context()) and setting the attribute before ending the span.

Suggested change
// Add session ID to span after context is available
sessionID := SessionIDFromContext(ctx)
span.SetAttributes(attribute.String("session.id", auth.TruncateSessionID(sessionID)))
next.ServeHTTP(w, r.WithContext(ctx))
req := r.WithContext(ctx)
next.ServeHTTP(w, req)
// Add session ID to span after request handling, once the session has been attached.
sessionID := SessionIDFromContext(req.Context())
span.SetAttributes(attribute.String("session.id", auth.TruncateSessionID(sessionID)))

Copilot uses AI. Check for mistakes.
Comment on lines +358 to +362
if tracingCfg != nil && tracingCfg.Endpoint != "" {
log.Printf("OpenTelemetry tracing enabled: endpoint=%s, service=%s, sampleRate=%.2f",
tracingCfg.Endpoint, tracingCfg.ServiceName, tracingCfg.SampleRate)
logger.LogInfoMd("startup", "OpenTelemetry tracing enabled: endpoint=%s, service=%s",
tracingCfg.Endpoint, tracingCfg.ServiceName)
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tracing enable/disable log message is based on tracingCfg.Endpoint, but InitProvider resolves the endpoint from OTEL_EXPORTER_OTLP_ENDPOINT first. In particular, if OTEL_EXPORTER_OTLP_ENDPOINT is set and a user runs with --otlp-endpoint= (explicitly disabling via CLI), the log will claim tracing is disabled while InitProvider will still enable tracing via the env var. Consider aligning the precedence (CLI > env > config) or logging the resolved endpoint actually used by InitProvider.

Suggested change
if tracingCfg != nil && tracingCfg.Endpoint != "" {
log.Printf("OpenTelemetry tracing enabled: endpoint=%s, service=%s, sampleRate=%.2f",
tracingCfg.Endpoint, tracingCfg.ServiceName, tracingCfg.SampleRate)
logger.LogInfoMd("startup", "OpenTelemetry tracing enabled: endpoint=%s, service=%s",
tracingCfg.Endpoint, tracingCfg.ServiceName)
resolvedTracingEndpoint := os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
if resolvedTracingEndpoint == "" && tracingCfg != nil {
resolvedTracingEndpoint = tracingCfg.Endpoint
}
if resolvedTracingEndpoint != "" {
log.Printf("OpenTelemetry tracing enabled: endpoint=%s, service=%s, sampleRate=%.2f",
resolvedTracingEndpoint, tracingCfg.ServiceName, tracingCfg.SampleRate)
logger.LogInfoMd("startup", "OpenTelemetry tracing enabled: endpoint=%s, service=%s",
resolvedTracingEndpoint, tracingCfg.ServiceName)

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +120
func TestInitProvider_SampleRateZero_UsesNeverSampler(t *testing.T) {
ctx := context.Background()

cfg := &config.TracingConfig{
Endpoint: "http://localhost:14318",
ServiceName: "test-service",
SampleRate: 0.0, // never sample
}

provider, err := tracing.InitProvider(ctx, cfg)
require.NoError(t, err)
require.NotNil(t, provider)

shutdownCtx, cancel := context.WithTimeout(ctx, 100*time.Millisecond)
defer cancel()
_ = provider.Shutdown(shutdownCtx)
}

func TestInitProvider_SampleRatePartial_UsesRatioSampler(t *testing.T) {
ctx := context.Background()

cfg := &config.TracingConfig{
Endpoint: "http://localhost:14318",
ServiceName: "test-service",
SampleRate: 0.5, // 50% sampling
}

provider, err := tracing.InitProvider(ctx, cfg)
require.NoError(t, err)
require.NotNil(t, provider)

shutdownCtx, cancel := context.WithTimeout(ctx, 100*time.Millisecond)
defer cancel()
_ = provider.Shutdown(shutdownCtx)
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample-rate tests are named as if they validate sampler selection, but they don’t assert anything about whether spans are sampled/recording. As written, they won’t catch regressions (e.g., SampleRate: 0.0 currently gets treated as default due to resolveSampleRate). Consider asserting span.SpanContext().IsSampled() / span.IsRecording() for 0.0 and 1.0 cases, and making the ratio test deterministic (e.g., by validating bounds rather than probabilistic sampling).

Copilot uses AI. Check for mistakes.
//
// Once initialized, obtain a tracer with:
//
// tracer := otel.Tracer("mcp-gateway")
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package comment’s example uses otel.Tracer("mcp-gateway"), but this package’s instrumentation scope is instrumentationName = "github.com/github/gh-aw-mcpg" and Tracer() returns otel.Tracer(instrumentationName). Consider updating the example to match the actual instrumentation name (or changing instrumentationName if the intended scope name is mcp-gateway) to avoid inconsistent tracer names in emitted telemetry.

Suggested change
// tracer := otel.Tracer("mcp-gateway")
// tracer := otel.Tracer("github.com/github/gh-aw-mcpg")

Copilot uses AI. Check for mistakes.
lpcox and others added 2 commits April 4, 2026 09:27
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 4, 2026

@copilot take a look at this comment and identify any opportunities/requirements for integration outside of the gateway github/gh-aw#24373 (comment)

The main integration point outside the gateway is the awmg proxy subcommand — a fully independent HTTPS server that intercepts gh CLI requests through a 6-phase DIFC pipeline. It was completely uninstrumented. I've now added tracing to it in commit 57f1564:

What was added:

  1. internal/tracing/http.go — a new shared WrapHTTPHandler helper usable by both the gateway and proxy packages.
  2. internal/proxy/proxy.goHandler() now wraps with tracing.WrapHTTPHandler, emitting a proxy.request span for every proxied request.
  3. internal/proxy/handler.gohandleWithDIFC() now emits:
    • proxy.difc_pipeline span covering all 6 DIFC phases (label_resource → evaluate → forward → label_response → filter → propagate)
    • proxy.backend.forward child span isolating the upstream GitHub API round-trip latency
  4. internal/cmd/proxy.gorunProxy() now initializes the OTEL tracer provider (noop when disabled) with graceful 5s flush on shutdown. Added --otlp-endpoint, --otlp-service-name, and --otlp-sample-rate flags (also readable from OTEL_EXPORTER_OTLP_ENDPOINT / OTEL_SERVICE_NAME env vars).

Other opportunities identified (not yet implemented):

  • Launcher connection establishment — spans around GetOrLaunch() / GetOrLaunchForSession() would expose backend startup latency.
  • Guard WASM sub-spans — individual DIFC phase sub-spans (label_resource, label_response, filter_collection) within callBackendTool() — currently only the full tool call and the backend execute have spans.
  • jq middleware — spans around large payload jq schema processing in internal/middleware/.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • example.com
    • Triggering command: /tmp/go-build843001112/b513/launcher.test /tmp/go-build843001112/b513/launcher.test -test.testlogfile=/tmp/go-build843001112/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s -I _.a -I x_amd64/vet --gdwarf-5 /known/wrappersp/tmp/go-build1737005516/b333/vet.cfg -o x_amd64/vet -W b29z9CVAk .cfg 64/pkg/tool/linux_amd64/vet . --gdwarf2 --64 64/pkg/tool/linux_amd64/vet (dns block)
  • invalid-host-that-does-not-exist-12345.com
    • Triggering command: /tmp/go-build843001112/b495/config.test /tmp/go-build843001112/b495/config.test -test.testlogfile=/tmp/go-build843001112/b495/testlog.txt -test.paniconexit0 -test.timeout=10m0s -I _.a -I x_amd64/vet --gdwarf-5 mmon -o x_amd64/vet -W yDOMqHOma .cfg 64/pkg/tool/linux_amd64/vet . --gdwarf2 --64 64/pkg/tool/linux_amd64/vet (dns block)
  • nonexistent.local
    • Triggering command: /tmp/go-build843001112/b513/launcher.test /tmp/go-build843001112/b513/launcher.test -test.testlogfile=/tmp/go-build843001112/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s -I _.a -I x_amd64/vet --gdwarf-5 /known/wrappersp/tmp/go-build1737005516/b333/vet.cfg -o x_amd64/vet -W b29z9CVAk .cfg 64/pkg/tool/linux_amd64/vet . --gdwarf2 --64 64/pkg/tool/linux_amd64/vet (dns block)
  • slow.example.com
    • Triggering command: /tmp/go-build843001112/b513/launcher.test /tmp/go-build843001112/b513/launcher.test -test.testlogfile=/tmp/go-build843001112/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s -I _.a -I x_amd64/vet --gdwarf-5 /known/wrappersp/tmp/go-build1737005516/b333/vet.cfg -o x_amd64/vet -W b29z9CVAk .cfg 64/pkg/tool/linux_amd64/vet . --gdwarf2 --64 64/pkg/tool/linux_amd64/vet (dns block)
  • this-host-does-not-exist-12345.com
    • Triggering command: /tmp/go-build843001112/b522/mcp.test /tmp/go-build843001112/b522/mcp.test -test.testlogfile=/tmp/go-build843001112/b522/testlog.txt -test.paniconexit0 -test.timeout=10m0s 5836�� elemetry.io/otel/exporters/otlp/otlptrace@v1.43.-errorsas .cfg x_amd64/vet --gdwarf-5 --64 -o x_amd64/vet pkg/�� 583678/b277/_pkg-s -trimpath x_amd64/vet -p .io/otel/exporte/usr/bin/runc -lang=go1.25 x_amd64/vet (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI changed the title feat: OpenTelemetry OTLP trace export from MCP Gateway feat: OpenTelemetry OTLP trace export from MCP Gateway and proxy Apr 4, 2026
Copilot finished work on behalf of lpcox April 4, 2026 16:29
@lpcox
Copy link
Copy Markdown
Collaborator

lpcox commented Apr 4, 2026

@copilot does the tracing cover backend calls that the guard makes? for example when an mcp request enters the gateway and hits a guard, do backend guard calls carry the same marker the original request?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 4, 2026

@copilot does the tracing cover backend calls that the guard makes? for example when an mcp request enters the gateway and hits a guard, do backend guard calls carry the same marker the original reque...

Yes — guard backend calls carry the same trace context as the original request, in both modes:

Gateway (guardBackendCaller): the gateway.tool_call span context is captured in ctx when the caller is created (line 398 in unified.go). CallTool always uses that stored g.ctx for executeBackendToolCall, so every guard enrichment call is a descendant of the tool call span.

Proxy (restBackendCaller): the proxy.difc_pipeline span context flows through fn.Call(ctx, ...) into the WASM runtime. Wazero propagates the same Go context to the hostCallBackend host function, which then calls CallTool(ctx, ...)forwardToGitHub(ctx, ...)http.NewRequestWithContext(ctx, ...). So guard enrichment calls to GitHub are direct descendants of the DIFC pipeline span.

One gap: guard enrichment calls (LabelResource/LabelResponse backend calls) don't get their own named child spans right now — they contribute latency to the parent span (gateway.tool_call or proxy.difc_pipeline) but won't appear as distinct nodes in a trace view. Adding a guard.backend.enrich sub-span would make that latency visible. I can add that if it's useful.

Copilot finished work on behalf of lpcox April 4, 2026 16:34
@lpcox
Copy link
Copy Markdown
Collaborator

lpcox commented Apr 4, 2026

@copilot where is the trace identifier created? is it passed into the gateway by the agent? or does the gateway generate the id for each request it receives?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 4, 2026

@copilot where is the trace identifier created? is it passed into the gateway by the agent? or does the gateway generate the id for each request it receives?

The gateway generates a fresh trace ID for every request it receives — agents cannot inject one today.

Here's what happens: both WithOTELTracing (gateway) and WrapHTTPHandler (proxy) call t.Start(r.Context(), ...) where r.Context() has no parent span attached. The incoming HTTP request headers are never examined for W3C traceparent/tracestate, so even if an agent sends those headers, the gateway ignores them and creates a new root span with a freshly generated trace ID.

To enable end-to-end distributed tracing (agent → gateway → backend MCP server), two things would need to be added to InitProvider and the HTTP middleware:

  1. Register a global W3C propagator: otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))
  2. Extract incoming context before starting the span: ctx = otel.GetTextMapPropagator().Extract(r.Context(), propagation.HeaderCarrier(r.Header))

Without that, each gateway request is a self-contained trace. I can add W3C context propagation if you want agents to be able to stitch their own spans into the gateway's traces.

- Use *float64 for SampleRate so 0.0 (NeverSample) is distinguishable
  from unset; add GetSampleRate() accessor with default fallback
- Remove env var override in resolveEndpoint/resolveServiceName since
  CLI flags already use env vars as defaults (correct precedence:
  CLI flag > env var > config file)
- Fix flag help text: "Defaults from" instead of "Overrides" for env
  var relationship
- Move session.id span attribute after next.ServeHTTP so the session
  is available from context
- Fix root.go tracing log to use resolved values instead of raw config
- Add IsSampled()/IsRecording() assertions to sampler tests
- Add SampleRateOne and SampleRateNil test cases
- Fix package doc example to use correct instrumentation name

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot finished work on behalf of lpcox April 4, 2026 16:37
@lpcox
Copy link
Copy Markdown
Collaborator

lpcox commented Apr 4, 2026

@copilot add w3c context propagation to the gateway but fall back to a gateway generated trace id if there isn't one present in the incoming mcp request

#3178 (comment)

Register a global W3C TextMapPropagator (TraceContext + Baggage) in
InitProvider for both noop and SDK paths.  Extract incoming traceparent/
tracestate headers in WithOTELTracing (gateway) and WrapHTTPHandler
(proxy) before starting each span, so an agent-originated trace is
continued rather than fragmented.  When no headers are present, OTEL
generates a fresh root span and trace ID (existing behavior).

Add three tests: propagator registration, header-driven continuation,
and no-header fallback to a fresh trace ID.

Agent-Logs-Url: https://github.com/github/gh-aw-mcpg/sessions/0209d486-c1c2-4927-bfcf-dc7822ec3990

Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 4, 2026

@copilot add w3c context propagation to the gateway but fall back to a gateway generated trace id if there isn't one present in the incoming mcp request

#3...

Done in commit 95e2119.

The W3C TraceContext + Baggage propagator is now registered globally in InitProvider (for both noop and real SDK paths). Both WithOTELTracing (gateway) and WrapHTTPHandler (proxy) now extract incoming traceparent/tracestate headers before starting a span:

ctx := otel.GetTextMapPropagator().Extract(r.Context(), propagation.HeaderCarrier(r.Header))
ctx, span := t.Start(ctx, "gateway.request", ...)

If an agent sends a traceparent header, the gateway span becomes a child of the agent's span, continuing the same trace. If no header is present, OTEL generates a fresh root span with a new trace ID (the existing behaviour). Three tests cover this: propagator registration, header-driven parent continuation, and no-header fallback.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • example.com
    • Triggering command: /tmp/go-build785380571/b513/launcher.test /tmp/go-build785380571/b513/launcher.test -test.testlogfile=/tmp/go-build785380571/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s -I olang.org/protobuf@v1.36.11/internal/filetype/build.go .cfg x_amd64/vet --gdwarf-5 --64 -o x_amd64/vet -I 4111879/b211/_pk-errorsas .cfg x_amd64/vet --gdwarf-5 .io/otel/internainspect -o x_amd64/vet (dns block)
  • invalid-host-that-does-not-exist-12345.com
    • Triggering command: /tmp/go-build785380571/b495/config.test /tmp/go-build785380571/b495/config.test -test.testlogfile=/tmp/go-build785380571/b495/testlog.txt -test.paniconexit0 -test.timeout=10m0s 4111�� g_.a ache/go/1.25.8/x64/src/crypto/in-ifaceassert x_amd64/vet --gdwarf-5 roundrobin -o x_amd64/vet -o g_.a 374111879/b242//_cgo_.o x_amd64/vet -p crypto/internal//tmp/go-build998943941/b198/vet.cfg 4111879/b242/ x_amd64/vet (dns block)
  • nonexistent.local
    • Triggering command: /tmp/go-build785380571/b513/launcher.test /tmp/go-build785380571/b513/launcher.test -test.testlogfile=/tmp/go-build785380571/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s -I olang.org/protobuf@v1.36.11/internal/filetype/build.go .cfg x_amd64/vet --gdwarf-5 --64 -o x_amd64/vet -I 4111879/b211/_pk-errorsas .cfg x_amd64/vet --gdwarf-5 .io/otel/internainspect -o x_amd64/vet (dns block)
  • slow.example.com
    • Triggering command: /tmp/go-build785380571/b513/launcher.test /tmp/go-build785380571/b513/launcher.test -test.testlogfile=/tmp/go-build785380571/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s -I olang.org/protobuf@v1.36.11/internal/filetype/build.go .cfg x_amd64/vet --gdwarf-5 --64 -o x_amd64/vet -I 4111879/b211/_pk-errorsas .cfg x_amd64/vet --gdwarf-5 .io/otel/internainspect -o x_amd64/vet (dns block)
  • this-host-does-not-exist-12345.com
    • Triggering command: /tmp/go-build785380571/b522/mcp.test /tmp/go-build785380571/b522/mcp.test -test.testlogfile=/tmp/go-build785380571/b522/testlog.txt -test.paniconexit0 -test.timeout=10m0s -o 4111879/b410/_pkg_.a TiyY/dyEj6sNqXXBEEx-BTiyY x_amd64/vet -p telabs/wazero/in-qE -lang=go1.25 x_amd64/vet (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot finished work on behalf of lpcox April 4, 2026 16:46
@lpcox lpcox merged commit 6540158 into main Apr 4, 2026
24 checks passed
@lpcox lpcox deleted the copilot/feat-opentelemetry-otlp-export branch April 4, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: OpenTelemetry OTLP trace export from MCP Gateway

3 participants