feat: OTLP trace export via observability.otlp frontmatter config#24441
feat: OTLP trace export via observability.otlp frontmatter config#24441
Conversation
…issue) Agent-Logs-Url: https://github.com/github/gh-aw/sessions/49301b6f-02ce-44b4-8fd5-1bc9a878d6cc Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot,add a JavaScript implementation to push messages to the OTLP endpoint and instrument the action setup to take the job name as an input so it pushes a message each time it runs the job. |
…e input Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5738fc76-45bf-47ab-af6c-8de1dc9ec689 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5738fc76-45bf-47ab-af6c-8de1dc9ec689 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in c890c53 (building on 233b74c):
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
set the trace id as an output of the action and also as a optional input so that you can correlate all jobs (use traceid from activation job, ignore preactivation) |
…orrelation Agent-Logs-Url: https://github.com/github/gh-aw/sessions/461b1d13-5dec-4bb7-8daa-d36d583465a4 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/461b1d13-5dec-4bb7-8daa-d36d583465a4 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…idTraceId in index.js Agent-Logs-Url: https://github.com/github/gh-aw/sessions/461b1d13-5dec-4bb7-8daa-d36d583465a4 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 4a40521 (building on ffd3965 and 6a3a6b9):
Usage for cross-job correlation: # Activation job
- uses: github/gh-aw/actions/setup@main
id: setup
with:
job-name: activation
# Agent job (depends on activation)
- uses: github/gh-aw/actions/setup@main
with:
job-name: agent
trace-id: ${{ needs.activation.outputs.setup-trace-id }}Pre-activation jobs that don't pass |
actions/setup/js/send_otlp_span.cjs
Outdated
| }, | ||
| scopeSpans: [ | ||
| { | ||
| scope: { name: "gh-aw.setup", version: "1.0.0" }, |
There was a problem hiding this comment.
- name gh-aw
- pass current version
There was a problem hiding this comment.
Done in 965d5d7:
- Scope
namechanged from"gh-aw.setup"to"gh-aw" - Scope
versionnow readsprocess.env.GH_AW_INFO_VERSION || "unknown"(same env var written by the compiler for every job)
actions/setup/js/send_otlp_span.cjs
Outdated
| body: JSON.stringify(payload), | ||
| }); | ||
| if (!response.ok) { | ||
| throw new Error(`OTLP export failed: HTTP ${response.status} ${response.statusText}`); |
There was a problem hiding this comment.
Done in 965d5d7. sendOTLPSpan now:
- Uses
console.warninstead of throwing on non-2xx responses - Retries up to 3 times (configurable) with exponential back-off (100 ms, 200 ms) before the final warning
|
@copilot add support for configuring headers for otel |
… spans for safe-outputs/conclusion jobs" Agent-Logs-Url: https://github.com/github/gh-aw/sessions/2a7e539d-2a7e-452d-8b26-8de15a7ebabe Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 965d5d7:
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing... |
|
✅ All tools validated successfully! Agent Container Smoke Test confirms agent container is ready. |
|
🌑 The shadows whisper... Smoke Codex failed. The oracle requires further meditation... |
|
🎬 THE END — Smoke Claude MISSION: ACCOMPLISHED! The hero saves the day! ✨ |
This comment has been minimized.
This comment has been minimized.
|
Smoke Test: Copilot - 23985081424
Overall: PR Author:
|
There was a problem hiding this comment.
🤖 Smoke test agent run §23985081424 — OTLP trace propagation looks well-structured. Two notes: (1) the setup-trace-id threading across all jobs is clean; (2) the Engine any nil-check should use a type-switch to skip empty strings.
📰 BREAKING: Report filed by Smoke Copilot · ● 820.7K
| lockdown_check_failed: ${{ steps.generate_aw_info.outputs.lockdown_check_failed == 'true' }} | ||
| model: ${{ steps.generate_aw_info.outputs.model }} | ||
| secret_verification_result: ${{ steps.validate-secret.outputs.verification_result }} | ||
| setup-trace-id: ${{ steps.setup.outputs.trace-id }} |
There was a problem hiding this comment.
👋 Smoke test agent (run §23985081424) — the setup-trace-id output correctly threads the OTLP trace ID from activation to downstream jobs, enabling full cross-job span correlation. Looks good! 🤖
| if fc.Engine != "" { | ||
| if fc.Engine != nil { | ||
| result["engine"] = fc.Engine | ||
| } |
There was a problem hiding this comment.
🤖 Smoke test agent (run §23985081424) — the Engine any nil-check misses empty-string engine values. Consider the type-switch guard discussed in sibling comments to preserve backward compatibility.
|
📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤 |
|
Smoke Test Run §23985081412 — PARTIAL Core (#1–12): ✅✅✅✅✅✅✅✅✅✅✅✅
|
There was a problem hiding this comment.
💥 Automated smoke test review - all systems nominal! (Run 23985081412)
💥 [THE END] — Illustrated by Smoke Claude · ● 302.8K
| lockdown_check_failed: ${{ steps.generate_aw_info.outputs.lockdown_check_failed == 'true' }} | ||
| model: ${{ steps.generate_aw_info.outputs.model }} | ||
| secret_verification_result: ${{ steps.validate-secret.outputs.verification_result }} | ||
| setup-trace-id: ${{ steps.setup.outputs.trace-id }} |
There was a problem hiding this comment.
Smoke test review comment #1 (Run 23985081412): The setup-trace-id output is correctly propagated here to enable unified OTLP tracing across jobs.
| uses: ./actions/setup | ||
| with: | ||
| destination: ${{ runner.temp }}/gh-aw/actions | ||
| job-name: ${{ github.job }} |
There was a problem hiding this comment.
Smoke test review comment #2 (Run 23985081412): Adding job-name to the setup action inputs is a good approach for improving trace observability per job.
|
@copilot the setup steps in jobs other than activation must use the traceid from the activation job (through needs.activation.outputs...) |
…n.outputs directly - detection, cache, safe_outputs, repo_memory, publish_assets all now use needs.activation.outputs.setup-trace-id for trace ID (not needs.agent.outputs) - Add activation to needs of detection, cache, repo_memory, and publish_assets jobs - safe_outputs now always includes activation in needs (not just for workflow_call/lock) - Update TestJobDependencies to reflect activation is always in safe_outputs needs - Recompile all 179 workflow lock files Agent-Logs-Url: https://github.com/github/gh-aw/sessions/59e2cee1-b721-47d0-b463-00641c8b6ca0 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in be04457. All non-activation job setup steps now reference
All 179 workflows recompiled successfully. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
✅ All tools validated successfully! Agent Container Smoke Test confirms agent container is ready. |
|
📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing... |
|
💥 WHOOSH! Smoke Claude springs into action on this pull request! [Panel 1 begins...] |
|
🔮 The ancient spirits stir... Smoke Codex awakens to divine this pull request... |
|
@copilot make sure the agent, conclusion, detection jobs also use the traceid from activation |
Agent Container Tool Check
Result: 12/12 tools available ✅ Overall Status: PASS
|
The OTel dependency in gh-aw was dormant — no way to export structured per-call span data to any OTLP backend. This adds the configuration plumbing to activate it, including a JavaScript implementation that instruments every job run with OTLP spans, supports cross-job trace correlation, propagates parent span context within a job for proper trace hierarchy, and threads the trace ID through
aw_contextso composite actions (dispatched child workflows) share a single OTLP trace with their parent.Changes
Go compiler
OTLPConfigstruct + schema — newobservability.otlp.endpointandobservability.otlp.headersfrontmatter fields; validated by the JSON schema withadditionalProperties: falseobservability_otlp.go— focused helpers:extractOTLPEndpointDomain— parses hostname from static URLs; skips${{ expressions }}that can't be resolved at compile timegetOTLPEndpointEnvValue— reads endpoint fromFrontmatterConfigextractOTLPConfigFromRaw— readsobservability.otlpdirectly from the rawmap[string]anyfrontmatter, bypassingParseFrontmatterConfig; ensures OTLP injection works even when parsing fails due to complex engine/tool configs(*Compiler).injectOTLPConfig— uses raw frontmatter as primary source (withParsedFrontmatteras fallback); auto-adds domain toNetworkPermissions.Allowed(AWF firewall) and appendsOTEL_EXPORTER_OTLP_ENDPOINT,OTEL_SERVICE_NAME=gh-aw, andOTEL_EXPORTER_OTLP_HEADERS(when configured) to the workflow-levelenv:blockfrontmatter_types.go—FrontmatterConfig.Enginechanged fromstringtoanyso object-style engine configs (e.g.{id: copilot, max-continuations: 2}) no longer causeParseFrontmatterConfigto fail silentlycompiler_orchestrator_workflow.go— single call toinjectOTLPConfigafterextractYAMLSections; no-op when unconfiguredcompiler_yaml_step_generation.go— setup step getsid: setupandjob-name: ${{ github.job }}; accepts atraceIDparameter for cross-job correlation; script-mode setup step gainsINPUT_JOB_NAME: ${{ github.job }}env var; newgenerateScriptModeCleanupStep()generates aClean Scriptsstep (if: always()) that runsclean.shwithINPUT_JOB_NAMEandINPUT_DESTINATIONsetup-trace-idoutput through all downstream jobs vianeeds.activation.outputs.setup-trace-id:setup-trace-id: ${{ steps.setup.outputs.trace-id }}as a job outputneeds.activation.outputs.setup-trace-idastrace-idinput and exposes its ownsetup-trace-idoutputactivationto theirneedsand passneeds.activation.outputs.setup-trace-iddirectlyneeds.activation.outputs.setup-trace-idJavaScript instrumentation
actions/setup/js/send_otlp_span.cjs:gh-aw.job.setupspan toOTEL_EXPORTER_OTLP_ENDPOINT/v1/tracesusing nativefetch(Node 24, no extra dependencies)nameis"gh-aw"andversionis read fromGH_AW_INFO_VERSIONat runtimeparseOTLPHeaders(raw)— parses the OTel speckey=value,key=valueformat with percent-decoding; correctly handles Sentry-style headers where the value contains spaces and embedded=signs (e.g.x-sentry-auth=Sentry sentry_key=<key>)sendOTLPSpanretries up to 3 times with exponential back-off (100 ms, 200 ms) and warns viaconsole.warnon failure instead of throwing — OTLP failures can never break a workflowsendJobSetupSpan— trace ID priority:options.traceId>INPUT_TRACE_ID>aw_info.context.otel_trace_id(parent's trace fromaw_context) > newly generated; returns{ traceId, spanId }for cross-job correlation; span attributes includegh-aw.job.name,gh-aw.workflow.name,gh-aw.engine.id,gh-aw.run.id,gh-aw.run.actor, andgh-aw.repositorysendJobConclusionSpan(spanName)— readsaw_info.jsonfor workflow/engine/model/version metadata andGH_AW_EFFECTIVE_TOKENSfor token usage; readsGITHUB_AW_OTEL_TRACE_IDandGITHUB_AW_OTEL_PARENT_SPAN_IDfrom env to establish parent-child span relationshipsbuildOTLPPayload— accepts an optionalparentSpanIdfield (omitted for root spans)appendToOTLPJSONL(payload)— mirrors every span payload as a JSON line to/tmp/gh-aw/otel.jsonlfor artifact inspectionactions/setup/js/action_setup_otlp.cjs(new) — standalone OTLP setup span runner shared by bothindex.js(dev/release/action mode) andsetup.sh(script mode); exportsrun()and executes immediately when invoked directly; writestrace-idtoGITHUB_OUTPUTandGITHUB_AW_OTEL_TRACE_ID/GITHUB_AW_OTEL_PARENT_SPAN_IDtoGITHUB_ENV; logs endpoint, sent span IDs, and each env var written; no-op with informational log when endpoint is not configuredactions/setup/js/action_conclusion_otlp.cjs(new) — standalone OTLP conclusion span runner shared by bothpost.js(dev/release/action mode) andclean.sh(script mode); exportsrun()and executes immediately when invoked directly; span named afterINPUT_JOB_NAMEwhen set; logs endpoint, span name, and completion; no-op with informational log when endpoint is not configuredactions/setup/js/aw_context.cjs—buildAwContext()now includesotel_trace_id: process.env.GITHUB_AW_OTEL_TRACE_IDso dispatched child workflows receive the parent's trace ID in theaw_contextpayloadactions/setup/action.yml— addedjob-nameinput;trace-idoptional input (32-char hex) andtrace-idoutput for cross-job correlationactions/setup/index.js— delegates OTLP setup span toaction_setup_otlp.cjsviaawait run(); passesGH_AW_SKIP_SETUP_OTLP=1when spawningsetup.shto prevent the double-span issue in action modeactions/setup/post.js— delegates OTLP conclusion span toaction_conclusion_otlp.cjsviaawait run()before/tmp/gh-awcleanupactions/setup/setup.sh— capturesSETUP_START_MSat the top; callsnode "${DESTINATION}/action_setup_otlp.cjs"at the end with surrounding echo messages; skips the OTLP block whenGH_AW_SKIP_SETUP_OTLP=1(set byindex.jsin action mode)actions/setup/clean.sh(new) — script-mode mirror ofpost.js: callsnode "${DESTINATION}/action_conclusion_otlp.cjs"with surrounding echo messages then removes/tmp/gh-aw/; injected by the compiler as aClean Scriptsstep (if: always()) in every jobSmoke workflows
smoke-copilot.mdandsmoke-copilot-arm.md— configured withobservability.otlp.endpoint: ${{ secrets.GH_AW_OTEL_ENDPOINT }}andobservability.otlp.headers: ${{ secrets.GH_AW_OTEL_HEADERS }}Usage
The compiler automatically wires cross-job trace correlation: every job (detection, safe-outputs, cache, repo-memory, publish-assets, unlock, notify-comment, QMD) receives the activation job's
setup-trace-iddirectly vianeeds.activation.outputs.setup-trace-id, ensuring all spans in a workflow run share the same OTLP trace ID originated in the activation job. Within each job the setup step writesGITHUB_AW_OTEL_TRACE_IDandGITHUB_AW_OTEL_PARENT_SPAN_IDto$GITHUB_ENVso all subsequent steps inherit parent span context: 1 trace per run, 1 parent span per job.The conclusion span is emitted by
post.js(dev/release/action mode) andclean.sh(script mode) — both call the sameaction_conclusion_otlp.cjsmodule. When a static URL is given as the endpoint, the hostname is automatically allowlisted in the AWF firewall.Composite action trace propagation
When a workflow dispatches a child workflow via
dispatch_workflow, the parent'sGITHUB_AW_OTEL_TRACE_IDis included in theaw_contextpayload asotel_trace_id. The child workflow's setup step reads this value fromaw_info.context.otel_trace_idand reuses it as the trace ID — connecting all composite-action spans under a single OTLP trace.