A complete reference implementation demonstrating manual instrumentation of AI agents using Sentry's AI Monitoring capabilities.
Live Demo β’ Documentation β’ Architecture
This example application showcases production-ready manual instrumentation for AI agents that don't use auto-instrumented libraries (OpenAI, Anthropic, etc.). Perfect for teams building with:
- Custom LLM APIs
- Proprietary AI models
- In-house agent frameworks
- Non-standard AI tooling
β Complete AI Agent Tracing Pipeline
- Frontend β Backend distributed tracing
- LLM call instrumentation with token tracking
- Tool execution monitoring with performance metrics
- Multi-step agent reasoning flows
β 7 Fully-Instrumented Tools
- Knowledge base search
- Order status lookup
- Account information retrieval
- Refund processing
- Inventory checks
- Callback scheduling
- Ticket creation
β Production-Grade Monitoring
- Per-tool token consumption tracking
- Cost analysis per agent invocation
- Tool usage patterns and performance
- Conversation quality metrics
- Error tracking across the AI pipeline
β Follows Official Sentry Conventions
- AI Agent span standards
- Proper attribute naming and types
- Correct span operations and hierarchies
- Best practices for distributed tracing
- Node.js 18+
- A Sentry account (optional for local testing)
# 1. Install dependencies
npm install
# 2. Configure Sentry (optional)
# Create .env.local and add your Sentry DSN
echo "NEXT_PUBLIC_SENTRY_DSN=your-dsn-here" > .env.local
# 3. Start the development server
npm run dev
# 4. Open http://localhost:3000The application includes an in-app guide showing example phrases. Try these to trigger different tools:
"Where is my order?" β check_order_status tool
"Check my account" β get_account_info tool
"Process a refund" β process_refund tool
"Is this in stock?" β check_inventory tool
"What's your return policy?" β search_knowledge_base tool
"Can you call me back?" β schedule_callback tool
"Escalate this issue" β create_ticket tool
Each phrase triggers tool execution spans with complete instrumentation visible in Sentry.
This application demonstrates a distributed AI agent architecture with complete observability:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Interaction β
β (Types: "Check my order") β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND (React Component) β
β π Span: gen_ai.invoke_agent β
β ββ Attributes: β
β β β’ gen_ai.agent.name: "Customer Support Agent" β
β β β’ conversation.session_id: "session_xxx" β
β β β’ conversation.turn: 1 β
β ββ Captures user-perceived latency β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β HTTP POST /api/ai/chat
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BACKEND (Next.js API Route) β
β π Span: gen_ai.invoke_agent β
β ββ Available Tools: [7 tools with descriptions] β
β β β
β ββ Step 1: Initial LLM Call β
β β π Span: gen_ai.chat β
β β ββ Attributes: β
β β β β’ gen_ai.request.model: "custom-model-v2" β
β β β β’ gen_ai.request.messages: [...] β
β β β β’ gen_ai.usage.total_tokens: 150 β
β β ββ Response: "Let me check your order status" β
β β β
β ββ Step 2: Execute Tools (if needed) β
β β π Span: gen_ai.execute_tool β
β β ββ Attributes: β
β β β β’ gen_ai.tool.name: "check_order_status" β
β β β β’ gen_ai.tool.description: "Look up orders" β
β β β β’ gen_ai.tool.input: '{"orderId":"ORD-123"}' β
β β β β’ gen_ai.tool.output: "Order shipped..." β
β β β β’ gen_ai.usage.total_tokens: 25 β
β β ββ Custom: order.id, tool duration β
β β β
β ββ Step 3: Final Synthesis LLM Call β
β π Span: gen_ai.chat β
β ββ Synthesizes tool results into response β
β ββ Tracks additional tokens: 45 β
β β
β Final Response: β
β ββ Total Tokens: 220 (150 + 25 + 45) β
β ββ Tools Used: ["check_order_status"] β
β ββ Resolution Status: "answered" β
β ββ Cost Estimate: $0.0220 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// User sends a message
await Sentry.startSpan({
name: 'invoke_agent Customer Support Agent',
op: 'gen_ai.invoke_agent',
attributes: {
'gen_ai.operation.name': 'invoke_agent',
'gen_ai.agent.name': 'Customer Support Agent',
'gen_ai.system': 'custom-llm',
'conversation.session_id': sessionId,
'conversation.turn': conversationHistory.length + 1
}
}, async (agentSpan) => {
// Call backend
const response = await fetch('/api/ai/chat', { ... });
// Set response attributes
agentSpan.setAttribute('gen_ai.response.text', response.message);
agentSpan.setAttribute('gen_ai.usage.total_tokens', response.totalTokens);
agentSpan.setAttribute('conversation.tools_used', response.toolsUsed.length);
});Why this matters: Captures the complete user experience including network time, providing true end-to-end visibility.
// Backend receives request and starts agent span
await Sentry.startSpan({
name: 'invoke_agent Customer Support Agent',
op: 'gen_ai.invoke_agent',
attributes: {
'gen_ai.request.available_tools': JSON.stringify(tools),
'conversation.session_id': sessionId
}
}, async (agentSpan) => {
// ... orchestrate LLM calls and tool executions
// Set final attributes
agentSpan.setAttribute('gen_ai.usage.total_tokens', totalTokens);
agentSpan.setAttribute('conversation.tools_used', JSON.stringify(toolsUsed));
agentSpan.setAttribute('conversation.resolution_status', resolutionStatus);
agentSpan.setAttribute('conversation.cost_estimate_usd', costEstimate);
});Why this matters: Central coordination point that aggregates all downstream metrics (tokens, tools, cost).
// Each LLM API call gets its own span
await Sentry.startSpan({
name: 'chat custom-model-v2',
op: 'gen_ai.chat',
attributes: {
'gen_ai.operation.name': 'chat',
'gen_ai.request.model': 'custom-model-v2',
'gen_ai.request.messages': JSON.stringify(messages),
'gen_ai.request.temperature': 0.7,
'gen_ai.request.max_tokens': 500
}
}, async (llmSpan) => {
const response = await callCustomLLM(...);
// Track token usage
llmSpan.setAttribute('gen_ai.usage.input_tokens', response.usage.prompt_tokens);
llmSpan.setAttribute('gen_ai.usage.output_tokens', response.usage.completion_tokens);
llmSpan.setAttribute('gen_ai.usage.total_tokens', response.usage.total_tokens);
llmSpan.setAttribute('gen_ai.response.text', response.message);
});Why this matters: Enables monitoring of LLM performance, cost per call, and response quality.
// Each tool gets a dedicated span
await Sentry.startSpan({
name: `execute_tool ${toolName}`,
op: 'gen_ai.execute_tool',
attributes: {
'gen_ai.operation.name': 'execute_tool',
'gen_ai.tool.name': toolName,
'gen_ai.tool.description': toolDescription,
'gen_ai.tool.type': 'function',
'gen_ai.tool.input': JSON.stringify(args)
}
}, async (toolSpan) => {
const result = await executeTool(toolName, args);
// Track tool-specific metrics
toolSpan.setAttribute('gen_ai.tool.output', result);
toolSpan.setAttribute('gen_ai.usage.total_tokens', toolTokens);
// Custom business metrics
toolSpan.setAttribute('order.id', orderId);
toolSpan.setAttribute('search.results_count', resultCount);
});Why this matters: Identifies slow or failing tools, tracks per-tool costs, enables optimization of agent workflows.
Once instrumented, this application enables powerful monitoring capabilities in Sentry:
Cost Analysis
- Total token consumption per conversation
- Average cost per agent invocation
- Token usage breakdown by LLM call vs. tool execution
- Per-tool token consumption patterns
Performance Tracking
- p50/p95/p99 latency of agent invocations
- LLM response time distribution
- Tool execution duration by tool type
- Conversation turn latency
Quality Metrics
- Resolution status distribution (answered, resolved, escalated)
- Tool usage patterns and frequency
- Conversations requiring escalation
- Average tools used per conversation
Each tool can be monitored independently:
check_order_status:
- Average execution time
- Success/failure rate
- Token consumption
- Custom: Order lookup patterns
search_knowledge_base:
- Search result relevance (results_count)
- Query patterns
- Knowledge gaps (low result counts)
process_refund:
- Refund amounts processed
- Success rates
- Processing time
get_account_info:
- Lookup type distribution (email vs ID)
- Cache hit rates (if implemented)
- Data retrieval performance
Find expensive conversations:
op:gen_ai.invoke_agent
WHERE gen_ai.usage.total_tokens > 500
GROUP BY conversation.session_id
Identify slow tools:
op:gen_ai.execute_tool
WHERE span.duration > 1s
GROUP BY gen_ai.tool.name
Track escalation reasons:
op:gen_ai.invoke_agent
WHERE conversation.resolution_status:escalated
Monitor token costs by model:
op:gen_ai.chat
SUM(gen_ai.usage.total_tokens)
GROUP BY gen_ai.request.model
- Framework: Next.js 16.0 (App Router)
- Language: TypeScript 5.x
- Monitoring: Sentry JavaScript SDK v10+
- Styling: Tailwind CSS
- Runtime: Node.js 18+
llm-tracing-test/
βββ src/
β βββ app/
β βββ page.tsx # Frontend chat interface
β β # - Agent span creation
β β # - Session management
β β # - User interaction tracking
β β
β βββ api/ai/chat/
β βββ route.ts # Backend agent orchestration
β # - Agent invocation span
β # - LLM call instrumentation
β # - Tool execution spans
β # - Token aggregation
β
βββ sentry.client.config.ts # Sentry frontend config
βββ sentry.server.config.ts # Sentry backend config
βββ instrumentation.ts # Sentry initialization
β
βββ TOOLS_DEMO_GUIDE.md # Comprehensive tool documentation
βββ CHANGELOG.md # Version history
βββ README.md # This file
This implementation strictly follows Sentry's AI Agent Monitoring conventions:
Required Attributes (Always Included)
gen_ai.system: Identifies the AI system (e.g., "custom-llm")gen_ai.request.model: Model identifier (e.g., "custom-model-v2")gen_ai.operation.name: Operation type (invoke_agent, chat, execute_tool)SEMANTIC_ATTRIBUTE_SENTRY_ORIGIN: Set to 'manual.ai.custom-llm'
Span Naming Conventions
- Agent spans:
invoke_agent {agent_name} - Chat spans:
chat {model_name} - Tool spans:
execute_tool {tool_name}
Token Tracking
gen_ai.usage.input_tokens: Prompt tokensgen_ai.usage.output_tokens: Completion tokensgen_ai.usage.total_tokens: Sum of input + output- Tool token usage tracked separately and aggregated
The application includes a realistic LLM simulator:
// Simulates API latency (300-1000ms)
await new Promise(resolve => setTimeout(resolve, 300 + Math.random() * 700));
// Returns structured responses with:
// - Realistic token counts
// - Tool calls based on message content
// - Proper error handling
// - OpenAI-compatible response formatWhy simulate instead of real LLM?
- Demonstrates pure instrumentation patterns
- No API keys required for testing
- Consistent, reproducible behavior
- Focuses on monitoring, not AI implementation
Each tool demonstrates different monitoring patterns:
| Tool | Demonstrates | Custom Attributes |
|---|---|---|
search_knowledge_base |
Search operations, result tracking | search.query, search.results_count |
check_order_status |
Database lookups, status tracking | order.id |
get_account_info |
CRM integration, data retrieval | account.lookup_type |
process_refund |
Transaction processing, amounts | refund.order_id, refund.amount |
check_inventory |
Stock checking, availability | inventory.product_id |
schedule_callback |
Scheduling operations, time tracking | callback.scheduled_time, callback.phone |
create_ticket |
Escalation, priority handling | ticket.id, ticket.priority |
All tools include:
- β Description attribute for AI Insights dashboard
- β Input/output serialization
- β Token usage tracking (15-50 tokens per tool)
- β Error instrumentation with error.type
- β Custom business metrics
- β Realistic execution latency (200-600ms)
π TOOLS_DEMO_GUIDE.md - Complete tool documentation with trigger phrases and instrumentation details
This example teaches:
- How to instrument custom AI agents without auto-instrumentation
- Proper span hierarchy for distributed AI systems
- Token tracking and cost attribution
- Tool execution monitoring patterns
- Error handling in AI pipelines
- Custom business metric capture
Adapt this for your use case:
- Replace simulated LLM with your API calls
- Add your actual tools and keep the instrumentation patterns
- Customize attributes for your business metrics
- Add authentication and real data sources
- Deploy to production with confidence
This is an example repository demonstrating instrumentation patterns. Feel free to:
- Open issues for clarification questions
- Submit PRs for improved examples
- Suggest additional tool patterns
- Share your own implementations
This example is provided as-is for educational purposes.
- Issues: Open a GitHub issue
- Documentation: Sentry Docs
- Community: Sentry Discord
Built with β€οΈ to demonstrate Sentry's AI Agent Monitoring
