Skip to content

Conversation

@Hona
Copy link
Collaborator

@Hona Hona commented Jan 2, 2026

Adds experimental OpenTelemetry support for debugging and observability.

What

  • Full OTEL instrumentation: all tools, MCP, sessions, LLM, LSP, plugins
  • Aspire Dashboard integration via bun run dev:otel
  • Service differentiation: opencode-cli vs opencode-server
  • Structured logs with full key=value context + exception stack traces
  • AI SDK telemetry with GenAI message content capture

Enabling OpenTelemetry

  1. Add to your global config (~/.config/opencode/opencode.json:
{
  "experimental": {
    "openTelemetry": true
  }
}
  1. Run with Aspire Dashboard:
cd packages/opencode
bun run dev:otel
  1. Open dashboard at http://localhost:18888

The OTEL_EXPORTER_OTLP_ENDPOINT env var controls the endpoint (defaults to http://localhost:4317).

Images

image image image image image image image

Hona added 30 commits January 2, 2026 14:11
Change experimental.openTelemetry config from boolean to union type
supporting both boolean and object with enabled/endpoint fields.
This allows users to configure custom OTLP endpoints for Aspire Dashboard
integration while maintaining backward compatibility with boolean config.
…tion

Add telemetry module with:
- Config interface and resolveConfig() for endpoint resolution
- init() function with NodeSDK, LoggerProvider, trace/log exporters
- shutdown() for graceful cleanup
- withSpan() helper for span creation with error handling
- isEnabled(), getTracer(), getLogger() utility functions
- SeverityMap for log level mapping
Integrate OpenTelemetry log emission into the Log module. When telemetry
is enabled, all log messages (debug/info/warn/error) are emitted to the
OTLP endpoint alongside file-based logging.

- Lazy-load telemetry module to avoid circular dependency
- Guard against recursive calls during module initialization
- Emit logs with proper severity levels using Telemetry.SeverityMap
- Initialize telemetry in yargs middleware after Log.init()
- Check OTEL_EXPORTER_OTLP_ENDPOINT env var or config.experimental.openTelemetry
- Register SIGTERM and SIGINT handlers for graceful shutdown
- Call Telemetry.shutdown() in finally block before process.exit()
Hona added 26 commits January 3, 2026 17:44
# Conflicts:
#	bun.lock
#	packages/opencode/package.json
Add the standard OpenTelemetry endpoint environment variable to the Flag
namespace for use in config loading to consolidate telemetry enablement checks.
… var checks

Since OTEL_EXPORTER_OTLP_ENDPOINT is now applied at config load time (Phase 2),
the CLI entry points no longer need to check the env var directly. This removes
the conditional that skipped config loading when the env var was set.
…isEnabled()

- Replace inline env var and config check with Telemetry.isEnabled() helper
- Remove unused Config import since telemetry config is now consolidated
- This ensures consistent telemetry enablement logic via single source of truth
The OTEL_EXPORTER_OTLP_ENDPOINT env var is now applied to config at load
time (in config/config.ts), so resolveConfig no longer needs to check it
directly. This simplifies the function to only handle the config object.
Add unit tests for Telemetry.resolveConfig and config loading behavior:
- Test resolveConfig handles boolean/object/undefined inputs correctly
- Test config loading from file with boolean and object openTelemetry config
- Test openTelemetry defaults to undefined when not configured
- Test OTEL_EXPORTER_OTLP_ENDPOINT env var override behavior

Update plan.md to mark testing task as completed.
})
logs.setGlobalLoggerProvider(loggerProvider)

sdk = new NodeSDK({

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recommendation (and what's somewhat conventional) is the app implementation is responsible for setting up this exporter machinery and then if the app is using a library that has an existing otel instrumentation, you enable that. For example ai-sdk provides otel instrumentation. If you use the openai sdk or claude sdk directly, you'd leverage that instrumentation.

The main issue I could imaging (assuming you don't want to be churning on setting all the attributes to work well across vendors) is that the attribute naming for llm related spans is still a bit of a mess as everyone is trying to figure out how to consistently name all these attributes.

If you're just collecting traces for performance sake and don't care about llm/eval then all these traces will show up just fine with the span operation.names you've defined in any trace viewer. The use case I mostly care about is shipping the signals to a tool like langfuse. Those tools expect specific names to show things like sessionID, llm generation, tool call etc.

The different vendors a re working on making the core attributes more uniform but it's not there yet, so personally, I'd try and punt most of that churn on to something like the ai sdk.

Here's a quick snapshot of the landscape of attribute definitions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed on the running application chooses the exporter.
ill check more but running opencode cli should configure the exporter.
also tbh this otel work is just to support development/profiling/debugging for now.

i'll check those emerging standards for attributes to see if I can consolidate.
for now any span/attribute is good and we can easily rename later.

I'll check your other comments later but I'm sure you saw I pushed a big refactor to clean up the implementation to be more like a decorator/using pattern to remove heaps of noise.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, saw the cleanup / refactor. This comment still applies and summarizes whatever still applies after your refactor.

agreed on the running application chooses the exporter.
ill check more but running opencode cli should configure the exporter.

Yep, I think we're saying the same thing here. the current experimental_telemetry just enables the ai sdk instrumentation but doesn't start an exporter. So mainly calling out that the biggest missing piece is something needs to start the exporter.

also tbh this otel work is just to support development/profiling/debugging for now.

That makes sense. Mostly calling out that if you retain the ai-sdk enabling, you don't need to re-instrument llm calls, tool calls etc since those are already done for you and will the most up to date evolving attributes so the same traces become useful for building agentic engineering evals or workflow review (the part I'm actually more excited about).

That obviously doesn't prevent wrapping those ai SDK calls in you own spans to get even finer-grained instrumentation.

That said, if you're mostly interested in instrumentation for performance profiling. I'd definitely consider setting up the node.js otel auto-instrumentation and metrics. Eg you'll probably find at least having metrics around GC stats like runs and pauses useful.

crude example:

const sdk = new NodeSDK({
  traceExporter: new ConsoleSpanExporter(),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new ConsoleMetricExporter(),
  }),
  instrumentations: [
     getNodeAutoInstrumentations()
  ],
});

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got some more work to make it perfect - but agree on your points.

i'll double check the ai sdk vs my custom spans.
the aspire dashboard looked pretty nice but i'll check if there's exact double ups

yup I'll add the typical instrumentation for node, even seeing if our underlying opentui/zig stuff can have instrumentation.

I'll see what is first cut/vs add to the PR. the team will check this out over the next few weeks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, if you wanted to spot check stuff against another otel collector and trace view, you can run a langfuse stack fully locally

git clone https://github.com/langfuse/langfuse.git
cd langfuse
docker compose up

everything (web app and otel collector endpoint is available on localhost:3000).

davekiss added a commit to davekiss/burnboard that referenced this pull request Jan 13, 2026
OpenTelemetry support for OpenCode is pending upstream approval.
Link to tracking PR: anomalyco/opencode#6629

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants