Skip to content

getsentry/llm-manual-agent-monitoring-example

Repository files navigation

Sentry AI Agent Monitoring - Manual Instrumentation Example

Sentry AI Agent Monitoring

A complete reference implementation demonstrating manual instrumentation of AI agents using Sentry's AI Monitoring capabilities.

Next.js Sentry TypeScript

Live Demo β€’ Documentation β€’ Architecture


🎯 What This Application Demonstrates

This example application showcases production-ready manual instrumentation for AI agents that don't use auto-instrumented libraries (OpenAI, Anthropic, etc.). Perfect for teams building with:

  • Custom LLM APIs
  • Proprietary AI models
  • In-house agent frameworks
  • Non-standard AI tooling

Key Features Demonstrated

βœ… Complete AI Agent Tracing Pipeline

  • Frontend β†’ Backend distributed tracing
  • LLM call instrumentation with token tracking
  • Tool execution monitoring with performance metrics
  • Multi-step agent reasoning flows

βœ… 7 Fully-Instrumented Tools

  • Knowledge base search
  • Order status lookup
  • Account information retrieval
  • Refund processing
  • Inventory checks
  • Callback scheduling
  • Ticket creation

βœ… Production-Grade Monitoring

  • Per-tool token consumption tracking
  • Cost analysis per agent invocation
  • Tool usage patterns and performance
  • Conversation quality metrics
  • Error tracking across the AI pipeline

βœ… Follows Official Sentry Conventions

  • AI Agent span standards
  • Proper attribute naming and types
  • Correct span operations and hierarchies
  • Best practices for distributed tracing

πŸš€ Getting Started

Prerequisites

  • Node.js 18+
  • A Sentry account (optional for local testing)

Quick Start

# 1. Install dependencies
npm install

# 2. Configure Sentry (optional)
# Create .env.local and add your Sentry DSN
echo "NEXT_PUBLIC_SENTRY_DSN=your-dsn-here" > .env.local

# 3. Start the development server
npm run dev

# 4. Open http://localhost:3000

Testing the Demo

The application includes an in-app guide showing example phrases. Try these to trigger different tools:

"Where is my order?"          β†’ check_order_status tool
"Check my account"            β†’ get_account_info tool  
"Process a refund"            β†’ process_refund tool
"Is this in stock?"           β†’ check_inventory tool
"What's your return policy?"  β†’ search_knowledge_base tool
"Can you call me back?"       β†’ schedule_callback tool
"Escalate this issue"         β†’ create_ticket tool

Each phrase triggers tool execution spans with complete instrumentation visible in Sentry.

πŸ“Š How It Works

Architecture Overview

This application demonstrates a distributed AI agent architecture with complete observability:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    User Interaction                      β”‚
β”‚                  (Types: "Check my order")               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            FRONTEND (React Component)                    β”‚
β”‚  πŸ“Š Span: gen_ai.invoke_agent                           β”‚
β”‚  β”œβ”€ Attributes:                                         β”‚
β”‚  β”‚  β€’ gen_ai.agent.name: "Customer Support Agent"      β”‚
β”‚  β”‚  β€’ conversation.session_id: "session_xxx"           β”‚
β”‚  β”‚  β€’ conversation.turn: 1                             β”‚
β”‚  └─ Captures user-perceived latency                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ HTTP POST /api/ai/chat
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              BACKEND (Next.js API Route)                 β”‚
β”‚  πŸ“Š Span: gen_ai.invoke_agent                           β”‚
β”‚  β”œβ”€ Available Tools: [7 tools with descriptions]        β”‚
β”‚  β”‚                                                       β”‚
β”‚  β”œβ”€ Step 1: Initial LLM Call                           β”‚
β”‚  β”‚  πŸ“Š Span: gen_ai.chat                               β”‚
β”‚  β”‚  β”œβ”€ Attributes:                                     β”‚
β”‚  β”‚  β”‚  β€’ gen_ai.request.model: "custom-model-v2"      β”‚
β”‚  β”‚  β”‚  β€’ gen_ai.request.messages: [...]               β”‚
β”‚  β”‚  β”‚  β€’ gen_ai.usage.total_tokens: 150               β”‚
β”‚  β”‚  └─ Response: "Let me check your order status"     β”‚
β”‚  β”‚                                                       β”‚
β”‚  β”œβ”€ Step 2: Execute Tools (if needed)                  β”‚
β”‚  β”‚  πŸ“Š Span: gen_ai.execute_tool                       β”‚
β”‚  β”‚  β”œβ”€ Attributes:                                     β”‚
β”‚  β”‚  β”‚  β€’ gen_ai.tool.name: "check_order_status"       β”‚
β”‚  β”‚  β”‚  β€’ gen_ai.tool.description: "Look up orders"    β”‚
β”‚  β”‚  β”‚  β€’ gen_ai.tool.input: '{"orderId":"ORD-123"}'   β”‚
β”‚  β”‚  β”‚  β€’ gen_ai.tool.output: "Order shipped..."       β”‚
β”‚  β”‚  β”‚  β€’ gen_ai.usage.total_tokens: 25                β”‚
β”‚  β”‚  └─ Custom: order.id, tool duration                β”‚
β”‚  β”‚                                                       β”‚
β”‚  └─ Step 3: Final Synthesis LLM Call                   β”‚
β”‚     πŸ“Š Span: gen_ai.chat                               β”‚
β”‚     β”œβ”€ Synthesizes tool results into response          β”‚
β”‚     └─ Tracks additional tokens: 45                    β”‚
β”‚                                                         β”‚
β”‚  Final Response:                                        β”‚
β”‚  β”œβ”€ Total Tokens: 220 (150 + 25 + 45)                 β”‚
β”‚  β”œβ”€ Tools Used: ["check_order_status"]                β”‚
β”‚  β”œβ”€ Resolution Status: "answered"                      β”‚
β”‚  └─ Cost Estimate: $0.0220                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Instrumentation Flow

1. Frontend Instrumentation (src/app/page.tsx)

// User sends a message
await Sentry.startSpan({
  name: 'invoke_agent Customer Support Agent',
  op: 'gen_ai.invoke_agent',
  attributes: {
    'gen_ai.operation.name': 'invoke_agent',
    'gen_ai.agent.name': 'Customer Support Agent',
    'gen_ai.system': 'custom-llm',
    'conversation.session_id': sessionId,
    'conversation.turn': conversationHistory.length + 1
  }
}, async (agentSpan) => {
  // Call backend
  const response = await fetch('/api/ai/chat', { ... });
  
  // Set response attributes
  agentSpan.setAttribute('gen_ai.response.text', response.message);
  agentSpan.setAttribute('gen_ai.usage.total_tokens', response.totalTokens);
  agentSpan.setAttribute('conversation.tools_used', response.toolsUsed.length);
});

Why this matters: Captures the complete user experience including network time, providing true end-to-end visibility.

2. Backend Agent Orchestration (src/app/api/ai/chat/route.ts)

// Backend receives request and starts agent span
await Sentry.startSpan({
  name: 'invoke_agent Customer Support Agent',
  op: 'gen_ai.invoke_agent',
  attributes: {
    'gen_ai.request.available_tools': JSON.stringify(tools),
    'conversation.session_id': sessionId
  }
}, async (agentSpan) => {
  // ... orchestrate LLM calls and tool executions
  
  // Set final attributes
  agentSpan.setAttribute('gen_ai.usage.total_tokens', totalTokens);
  agentSpan.setAttribute('conversation.tools_used', JSON.stringify(toolsUsed));
  agentSpan.setAttribute('conversation.resolution_status', resolutionStatus);
  agentSpan.setAttribute('conversation.cost_estimate_usd', costEstimate);
});

Why this matters: Central coordination point that aggregates all downstream metrics (tokens, tools, cost).

3. LLM Call Instrumentation

// Each LLM API call gets its own span
await Sentry.startSpan({
  name: 'chat custom-model-v2',
  op: 'gen_ai.chat',
  attributes: {
    'gen_ai.operation.name': 'chat',
    'gen_ai.request.model': 'custom-model-v2',
    'gen_ai.request.messages': JSON.stringify(messages),
    'gen_ai.request.temperature': 0.7,
    'gen_ai.request.max_tokens': 500
  }
}, async (llmSpan) => {
  const response = await callCustomLLM(...);
  
  // Track token usage
  llmSpan.setAttribute('gen_ai.usage.input_tokens', response.usage.prompt_tokens);
  llmSpan.setAttribute('gen_ai.usage.output_tokens', response.usage.completion_tokens);
  llmSpan.setAttribute('gen_ai.usage.total_tokens', response.usage.total_tokens);
  llmSpan.setAttribute('gen_ai.response.text', response.message);
});

Why this matters: Enables monitoring of LLM performance, cost per call, and response quality.

4. Tool Execution Instrumentation

// Each tool gets a dedicated span
await Sentry.startSpan({
  name: `execute_tool ${toolName}`,
  op: 'gen_ai.execute_tool',
  attributes: {
    'gen_ai.operation.name': 'execute_tool',
    'gen_ai.tool.name': toolName,
    'gen_ai.tool.description': toolDescription,
    'gen_ai.tool.type': 'function',
    'gen_ai.tool.input': JSON.stringify(args)
  }
}, async (toolSpan) => {
  const result = await executeTool(toolName, args);
  
  // Track tool-specific metrics
  toolSpan.setAttribute('gen_ai.tool.output', result);
  toolSpan.setAttribute('gen_ai.usage.total_tokens', toolTokens);
  
  // Custom business metrics
  toolSpan.setAttribute('order.id', orderId);
  toolSpan.setAttribute('search.results_count', resultCount);
});

Why this matters: Identifies slow or failing tools, tracks per-tool costs, enables optimization of agent workflows.

πŸ“ˆ What You Can Monitor

Once instrumented, this application enables powerful monitoring capabilities in Sentry:

Agent Performance Metrics

Cost Analysis

  • Total token consumption per conversation
  • Average cost per agent invocation
  • Token usage breakdown by LLM call vs. tool execution
  • Per-tool token consumption patterns

Performance Tracking

  • p50/p95/p99 latency of agent invocations
  • LLM response time distribution
  • Tool execution duration by tool type
  • Conversation turn latency

Quality Metrics

  • Resolution status distribution (answered, resolved, escalated)
  • Tool usage patterns and frequency
  • Conversations requiring escalation
  • Average tools used per conversation

Tool-Specific Insights

Each tool can be monitored independently:

check_order_status:
  - Average execution time
  - Success/failure rate
  - Token consumption
  - Custom: Order lookup patterns

search_knowledge_base:
  - Search result relevance (results_count)
  - Query patterns
  - Knowledge gaps (low result counts)

process_refund:
  - Refund amounts processed
  - Success rates
  - Processing time

get_account_info:
  - Lookup type distribution (email vs ID)
  - Cache hit rates (if implemented)
  - Data retrieval performance

Example Sentry Queries

Find expensive conversations:

op:gen_ai.invoke_agent
WHERE gen_ai.usage.total_tokens > 500
GROUP BY conversation.session_id

Identify slow tools:

op:gen_ai.execute_tool
WHERE span.duration > 1s
GROUP BY gen_ai.tool.name

Track escalation reasons:

op:gen_ai.invoke_agent
WHERE conversation.resolution_status:escalated

Monitor token costs by model:

op:gen_ai.chat
SUM(gen_ai.usage.total_tokens)
GROUP BY gen_ai.request.model

πŸ› οΈ Technology Stack

  • Framework: Next.js 16.0 (App Router)
  • Language: TypeScript 5.x
  • Monitoring: Sentry JavaScript SDK v10+
  • Styling: Tailwind CSS
  • Runtime: Node.js 18+

πŸ“ Project Structure

llm-tracing-test/
β”œβ”€β”€ src/
β”‚   └── app/
β”‚       β”œβ”€β”€ page.tsx                    # Frontend chat interface
β”‚       β”‚                               # - Agent span creation
β”‚       β”‚                               # - Session management
β”‚       β”‚                               # - User interaction tracking
β”‚       β”‚
β”‚       └── api/ai/chat/
β”‚           └── route.ts                # Backend agent orchestration
β”‚                                       # - Agent invocation span
β”‚                                       # - LLM call instrumentation
β”‚                                       # - Tool execution spans
β”‚                                       # - Token aggregation
β”‚
β”œβ”€β”€ sentry.client.config.ts            # Sentry frontend config
β”œβ”€β”€ sentry.server.config.ts            # Sentry backend config
β”œβ”€β”€ instrumentation.ts                 # Sentry initialization
β”‚
β”œβ”€β”€ TOOLS_DEMO_GUIDE.md                # Comprehensive tool documentation
β”œβ”€β”€ CHANGELOG.md                       # Version history
└── README.md                          # This file

πŸ” Key Implementation Details

Following Sentry Standards

This implementation strictly follows Sentry's AI Agent Monitoring conventions:

Required Attributes (Always Included)

  • gen_ai.system: Identifies the AI system (e.g., "custom-llm")
  • gen_ai.request.model: Model identifier (e.g., "custom-model-v2")
  • gen_ai.operation.name: Operation type (invoke_agent, chat, execute_tool)
  • SEMANTIC_ATTRIBUTE_SENTRY_ORIGIN: Set to 'manual.ai.custom-llm'

Span Naming Conventions

  • Agent spans: invoke_agent {agent_name}
  • Chat spans: chat {model_name}
  • Tool spans: execute_tool {tool_name}

Token Tracking

  • gen_ai.usage.input_tokens: Prompt tokens
  • gen_ai.usage.output_tokens: Completion tokens
  • gen_ai.usage.total_tokens: Sum of input + output
  • Tool token usage tracked separately and aggregated

Simulated LLM Behavior

The application includes a realistic LLM simulator:

// Simulates API latency (300-1000ms)
await new Promise(resolve => setTimeout(resolve, 300 + Math.random() * 700));

// Returns structured responses with:
// - Realistic token counts
// - Tool calls based on message content
// - Proper error handling
// - OpenAI-compatible response format

Why simulate instead of real LLM?

  • Demonstrates pure instrumentation patterns
  • No API keys required for testing
  • Consistent, reproducible behavior
  • Focuses on monitoring, not AI implementation

The 7 Production-Ready Tools

Each tool demonstrates different monitoring patterns:

Tool Demonstrates Custom Attributes
search_knowledge_base Search operations, result tracking search.query, search.results_count
check_order_status Database lookups, status tracking order.id
get_account_info CRM integration, data retrieval account.lookup_type
process_refund Transaction processing, amounts refund.order_id, refund.amount
check_inventory Stock checking, availability inventory.product_id
schedule_callback Scheduling operations, time tracking callback.scheduled_time, callback.phone
create_ticket Escalation, priority handling ticket.id, ticket.priority

All tools include:

  • βœ… Description attribute for AI Insights dashboard
  • βœ… Input/output serialization
  • βœ… Token usage tracking (15-50 tokens per tool)
  • βœ… Error instrumentation with error.type
  • βœ… Custom business metrics
  • βœ… Realistic execution latency (200-600ms)

πŸ“– TOOLS_DEMO_GUIDE.md - Complete tool documentation with trigger phrases and instrumentation details

πŸŽ“ Learning Resources

For Developers Implementing Similar Systems

This example teaches:

  1. How to instrument custom AI agents without auto-instrumentation
  2. Proper span hierarchy for distributed AI systems
  3. Token tracking and cost attribution
  4. Tool execution monitoring patterns
  5. Error handling in AI pipelines
  6. Custom business metric capture

Adapt this for your use case:

  • Replace simulated LLM with your API calls
  • Add your actual tools and keep the instrumentation patterns
  • Customize attributes for your business metrics
  • Add authentication and real data sources
  • Deploy to production with confidence

Documentation References

🀝 Contributing

This is an example repository demonstrating instrumentation patterns. Feel free to:

  • Open issues for clarification questions
  • Submit PRs for improved examples
  • Suggest additional tool patterns
  • Share your own implementations

πŸ“„ License

This example is provided as-is for educational purposes.

πŸ’¬ Support


Built with ❀️ to demonstrate Sentry's AI Agent Monitoring

Sentry.io | Documentation | GitHub

About

No description or website provided.

Topics

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published