Skip to content

Mnovich/opentelemetry#3257

Closed
Kvadratni wants to merge 43 commits intomainfrom
mnovich/opentelemetry
Closed

Mnovich/opentelemetry#3257
Kvadratni wants to merge 43 commits intomainfrom
mnovich/opentelemetry

Conversation

@Kvadratni
Copy link
Contributor

@Kvadratni Kvadratni commented Jul 4, 2025

Comprehensive OpenTelemetry Integration for Goose

This PR introduces a complete telemetry system for Goose, providing observability across CLI, Desktop UI, and recipe executions with support for multiple telemetry backends.

Core Features

Multi-Provider Telemetry Architecture

  • Console Provider: Debug output with structured logging
  • File Provider: JSON-based telemetry export to local files
  • OTLP Provider: OpenTelemetry Protocol support for Jaeger, Tempo, and other OTLP-compatible backends
  • Datadog Provider: Direct HTTP API integration for metrics and traces

Comprehensive Event Tracking

  • Recipe Executions: Track recipe runs with metadata, duration, token usage, and tool usage statistics
  • Interactive Sessions: Monitor CLI and UI conversations with message counts, turn tracking, and performance metrics
  • Command Executions: Capture CLI command usage patterns and performance
  • Error Tracking: Detailed error context with stack traces and failing tool identification

Rich Telemetry Data

  • Token usage tracking (input/output tokens, estimated costs, model/provider info)
  • Tool usage analytics (call counts, success rates, average duration)
  • Environment detection (CI/CD, Docker, cloud platforms, terminal types)
  • User identification and usage type classification (Human/Automation/CI)
  • Comprehensive metadata and tagging support

Technical Implementation

Architecture

  • Centralized TelemetryManager with global singleton pattern
  • Event-driven design with structured telemetry events
  • Async-first implementation with proper resource cleanup
  • Configurable via environment variables with validation

Integration Points

  • CLI integration across all commands (configure, session management, recipe execution)
  • Desktop UI telemetry via enhanced message stream hooks
  • Server-side session tracking for both streaming and non-streaming endpoints
  • Recipe execution tracking with comprehensive parameter capture

Performance & Reliability

  • Batch export with forced flush on shutdown for OTLP
  • HTTP-based Datadog integration (no agent required)
  • Graceful error handling with telemetry event generation for failures
  • Memory-efficient with configurable providers

Configuration

# Enable telemetry
export GOOSE_TELEMETRY_ENABLED=true
export GOOSE_TELEMETRY_PROVIDER=otlp|datadog|console|file

# OTLP Configuration
export GOOSE_TELEMETRY_ENDPOINT=http://localhost:4317
export OTEL_SERVICE_NAME=goose

# Datadog Configuration  
export GOOSE_TELEMETRY_API_KEY=your_api_key
export GOOSE_TELEMETRY_ENDPOINT=https://api.datadoghq.com

# File Configuration
export GOOSE_TELEMETRY_ENDPOINT=/path/to/telemetry.log

This implementation provides production-ready observability for Goose deployments, enabling teams to monitor usage patterns, performance metrics, and system health across all interaction modes.

@Kvadratni Kvadratni force-pushed the mnovich/opentelemetry branch 5 times, most recently from 2c6b8db to bdb8516 Compare July 9, 2025 21:57
@Kvadratni Kvadratni marked this pull request as ready for review July 9, 2025 22:07
@HalogenAI
Copy link

@lifeizhou-ap not sure if you have a better understanding on this from AI Log Reducer on the best practices to connect to DataDog?

@lily-de
Copy link
Contributor

lily-de commented Jul 10, 2025

I see this touches the message stream component in the UI -- think we may want to wait for this one till after the UI refactor just to be safe but will loop in Zane just to check

@Kvadratni Kvadratni force-pushed the mnovich/opentelemetry branch from 3e1f719 to b6a6469 Compare July 10, 2025 16:39
Copy link
Contributor

@cloud-on-prem cloud-on-prem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; left some minor suggestions/comments.

@Kvadratni Kvadratni requested a review from jamadeo July 11, 2025 16:30
@lifeizhou-ap
Copy link
Collaborator

lifeizhou-ap commented Jul 14, 2025

@lifeizhou-ap not sure if you have a better understanding on this from AI Log Reducer on the best practices to connect to DataDog?

Hi @HalogenAI in Log Reducer, our focus is on monitoring log volumes, which is a different concern from what’s implemented in this PR, where the emphasis is on sending or tracking custom events (e.g., via send_event). These are separate areas in Datadog, and serving different observability goals.

@Kvadratni Kvadratni force-pushed the mnovich/opentelemetry branch from b6a6469 to 87b88ee Compare July 14, 2025 18:04
Copy link
Collaborator

@jamadeo jamadeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice and will be super useful! My two main pieces of feedback are:

  • Try to do as little as possible in the cli/server implementation, and move the telemetry-related logic to the telemetry crate. This will better separate the responsibilities of each. I'm also not a huge fan of having to put everything in closures passed to the telemetry manager -- maybe that can be done more cleanly with a macro? WDYT?
  • See if you can reuse the serializable structs e.g. SessionMetadata to build the telemetry payload as-is instead of having to convert everywhere

@DOsinga
Copy link
Collaborator

DOsinga commented Jul 16, 2025

how does this relate to: #3401

Copy link
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the drive by - it does feel like a de-LLM-ify pass would help

@Kvadratni
Copy link
Contributor Author

how does this relate to: #3401

@DOsinga it does not at all

@Kvadratni Kvadratni force-pushed the mnovich/opentelemetry branch 4 times, most recently from ab49d84 to 460b831 Compare July 16, 2025 22:54
@Kvadratni Kvadratni force-pushed the mnovich/opentelemetry branch from 5380740 to 98eeee9 Compare July 31, 2025 17:39
@Kvadratni Kvadratni force-pushed the mnovich/opentelemetry branch from 98eeee9 to a059176 Compare July 31, 2025 17:42
@Kvadratni Kvadratni mentioned this pull request Aug 1, 2025
@Kvadratni Kvadratni closed this Aug 1, 2025
@Kvadratni
Copy link
Contributor Author

replaced by #3772

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants