Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 167 additions & 0 deletions api/core/ops/aliyun_trace/METRICS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# Aliyun Trace Metrics Implementation

## Overview

This implementation adds OpenTelemetry histogram metrics to the Aliyun Trace component for tracking LLM operations. All metrics include an `app_name` tag that corresponds to the configured service name.

## Metrics

The following histogram metrics are exported to the Aliyun OTLP endpoint:

### 1. `gen_ai.client.time_to_first_token`
- **Type**: Histogram
- **Unit**: seconds (s)
- **Description**: Time to first token in LLM responses
- **Tags**: `app_name`, `operation`
- **Notes**: Only recorded when explicitly provided (not always available in current data)

### 2. `gen_ai.client.time_per_output_token`
- **Type**: Histogram
- **Unit**: seconds (s)
- **Description**: Average time per output token
- **Tags**: `app_name`, `operation`
- **Notes**: Automatically calculated as `duration / completion_tokens` if not provided

### 3. `gen_ai.client.time_between_token`
- **Type**: Histogram
- **Unit**: seconds (s)
- **Description**: Time between tokens in LLM responses
- **Tags**: `app_name`, `operation`
- **Notes**: Only recorded when explicitly provided (not always available in current data)

### 4. `gen_ai.client.operation`
- **Type**: Histogram
- **Unit**: dimensionless (1)
- **Description**: LLM operation count (value of 1 per operation)
- **Tags**: `app_name`, `operation`
- **Notes**: Used for counting operations by type

### 5. `gen_ai.usage.usage.prompt_tokens_details.cached_tokens`
- **Type**: Histogram
- **Unit**: dimensionless (1)
- **Description**: Number of cached tokens used
- **Tags**: `app_name`, `operation`
- **Notes**: Only recorded when cached tokens > 0

### 6. `gen_ai.client.operation.duration`
- **Type**: Histogram
- **Unit**: seconds (s)
- **Description**: Duration of LLM operations
- **Tags**: `app_name`, `operation`
- **Notes**: Total latency from start to finish

### 7. `gen_ai.client.token.usage`
- **Type**: Histogram
- **Unit**: dimensionless (1)
- **Description**: Token usage in LLM operations
- **Tags**: `app_name`, `operation`
- **Notes**: Total tokens (prompt + completion)

## Architecture

### Components

1. **MetricsClient** (`core/ops/aliyun_trace/data_exporter/traceclient.py`)
- Manages OpenTelemetry metrics infrastructure
- Creates histogram instruments for each metric type
- Exports metrics to Aliyun OTLP endpoint
- Automatically includes `app_name` tag in all metrics

2. **AliyunDataTrace** (`core/ops/aliyun_trace/aliyun_trace.py`)
- Integrates metrics recording with trace span creation
- Records metrics when LLM spans are created in workflows
- Records metrics for message-based LLM calls

### Integration Points

Metrics are recorded at the following points:

1. **Workflow LLM Nodes** (`build_workflow_llm_span` method)
- Extracts usage data from `process_data` and `outputs`
- Records metrics when total_tokens > 0 and latency > 0

2. **Message LLM Calls** (`message_trace` method)
- Calculates duration from start_time and end_time
- Records metrics for simple chat/completion calls

## Configuration

The MetricsClient is automatically initialized with:
- **service_name**: Taken from `aliyun_config.app_name`
- **endpoint**: Same OTLP endpoint used for traces
- **export_interval**: 5000ms (5 seconds) by default

## Data Flow

```
LLM Execution
process_data/usage extraction
MetricsClient.record_llm_metrics()
Histogram.record() with app_name tag
PeriodicExportingMetricReader
OTLPMetricExporter
Aliyun OTLP Endpoint
```

## Usage Example

```python
# Metrics are automatically recorded when trace spans are created
# No manual intervention needed

# For workflow LLM nodes:
usage_data = {
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150,
"latency": 2.0
}
# → Metrics automatically recorded in build_workflow_llm_span()

# For message LLM calls:
trace_info = MessageTraceInfo(
message_tokens=100,
answer_tokens=50,
total_tokens=150,
start_time=datetime.now(),
end_time=datetime.now() + timedelta(seconds=2)
)
# → Metrics automatically recorded in message_trace()
```

## Testing

Unit tests are provided in `tests/unit_tests/core/ops/aliyun_trace/test_metrics_client.py`:

- Test MetricsClient initialization
- Test basic metric recording
- Test optional field recording
- Test automatic time-per-token calculation
- Test app_name tag inclusion
- Test graceful shutdown

## Limitations and Future Enhancements

### Current Limitations

1. **Time to First Token**: Not currently captured in process_data
- Could be added in future by tracking streaming token events

2. **Time Between Tokens**: Not currently available
- Would require streaming event timestamps

3. **Cached Tokens**: Not currently exposed by LLM providers
- Placeholder implemented for when this data becomes available

### Future Enhancements

1. Add streaming token event tracking for more granular timing metrics
2. Extract cached token information when LLM providers expose it
3. Add more operation types (e.g., "embedding", "reranker")
4. Add model-specific tags (provider, model name) for better filtering
33 changes: 33 additions & 0 deletions api/core/ops/aliyun_trace/aliyun_trace.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,22 @@ def message_trace(self, trace_info: MessageTraceInfo):
app_model_config = getattr(trace_info.message_data, "app_model_config", {})
pre_prompt = getattr(app_model_config, "pre_prompt", "")
inputs_data = getattr(trace_info.message_data, "inputs", {})

# Calculate duration for metrics
duration = 0.0
if trace_info.start_time and trace_info.end_time:
duration = (trace_info.end_time - trace_info.start_time).total_seconds()

# Record LLM metrics
if trace_info.total_tokens > 0 and duration > 0:
self.trace_client.metrics_client.record_llm_metrics(
operation="message_llm",
duration=duration,
prompt_tokens=trace_info.message_tokens,
completion_tokens=trace_info.answer_tokens,
total_tokens=trace_info.total_tokens,
)

llm_span = SpanData(
trace_id=trace_id,
parent_span_id=message_span_id,
Expand Down Expand Up @@ -396,6 +412,23 @@ def build_workflow_llm_span(
process_data = node_execution.process_data or {}
outputs = node_execution.outputs or {}
usage_data = process_data.get("usage", {}) if "usage" in process_data else outputs.get("usage", {})

# Extract metrics data
prompt_tokens = usage_data.get("prompt_tokens", 0)
completion_tokens = usage_data.get("completion_tokens", 0)
total_tokens = usage_data.get("total_tokens", 0)
latency = usage_data.get("latency", 0.0)

# Record LLM metrics
if total_tokens > 0 and latency > 0:
self.trace_client.metrics_client.record_llm_metrics(
operation="llm",
duration=latency,
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
total_tokens=total_tokens,
)

return SpanData(
trace_id=trace_id,
parent_span_id=workflow_span_id,
Expand Down
Loading