-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Feature: CrewAI Token Tracking Enhancement #4132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Resolved 4 review comments from Cursor Bugbot: 1. Added token tracking for async tasks in _execute_tasks and _process_async_tasks 2. Fixed task key collision by including task_id in the key 3. Added token tracking for _aexecute_tasks paths (both sync and async) 4. Fixed agent metrics to be keyed by agent_id to handle multiple agents with same role All async tasks now capture tokens_before/after and attach metrics properly. Task metrics now use unique keys to prevent overwriting. Agent metrics properly track separate agents with same role.
…sy23/crewAI-telemetry into feat/per-user-token-tracing
Review Comments Resolved ✅I've addressed all 4 review comments from Cursor Bugbot in commit 1. ✅ Async tasks missing per-task token trackingFixed: Added token tracking for async tasks in 2. ✅ Task key collision causing metrics overwritingFixed: Updated task key to include 3. ✅ Async kickoff path missing all per-task token trackingFixed: Added per-task token tracking to 4. ✅ Multiple agents with same role get combined metricsFixed: Updated All changes maintain backward compatibility and follow the same pattern used for synchronous task execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Resolved race condition where concurrent async tasks from same agent would get incorrect token attribution. Solution wraps async task execution to capture tokens_after immediately upon task completion, before other concurrent tasks can interfere. Changes: - Wrapped async task execution to return (result, tokens_after) tuple - Updated _aprocess_async_tasks to unwrap and use captured tokens_after - Updated type hints for pending_tasks to reflect new signature Note: Threading-based async_execution still has similar race condition as it's harder to wrap threaded execution. Will track separately.
|
@joaomdmoura can you please look into the PR? |
Capture task, exec_data, and context via default arguments to avoid Python's late-binding closure behavior. Without this fix, when multiple async tasks are created back-to-back, they would all reference values from the last loop iteration, causing wrong tasks to be executed with wrong agents and incorrect token attribution.
…itation 1. Fixed manager agent using manager_role as key instead of manager_id. Now all agents (regular and manager) are keyed by agent_id in workflow_metrics.per_agent for consistency. 2. Added documentation for the threading-based async task race condition in _process_async_tasks. This is a known limitation tracked by issue crewAIInc#4168. Users should use akickoff() for accurate async task token tracking.
2 Additional Review Comments Resolved ✅1. Sync async tasks have unfixed race condition for tokensStatus: Documented (known limitation tracked by issue #4168) Added documentation to
2. Inconsistent per-agent dictionary keys cause lookup issuesFixed in commit Changed manager agent keying from # Before (inconsistent)
workflow_metrics.per_agent[manager_role] = manager_metrics # keyed by role
workflow_metrics.per_agent[agent_id] = agent_metrics # keyed by id
# After (consistent)
workflow_metrics.per_agent[manager_id] = manager_metrics # keyed by id
workflow_metrics.per_agent[agent_id] = agent_metrics # keyed by idNow all agents (regular and manager) are consistently keyed by |
Instead of calling task.execute_async() and capturing tokens_after outside the thread, we now: 1. Create a wrapper function that executes task.execute_sync() in thread 2. Capture tokens_after immediately after completion WITHIN the thread 3. Return (result, tokens_after) tuple from the thread 4. Unwrap and use captured tokens_after in _process_async_tasks This is the same approach used for asyncio tasks and properly avoids race conditions when concurrent tasks from the same agent run in parallel. Also uses default arguments to avoid late-binding closure issues.
Threading Race Condition Properly Fixed ✅Issue: Sync async tasks have unfixed race condition for tokens Fixed in commit Instead of just documenting the limitation, I implemented a proper fix using the same approach as the asyncio version: Solution
# Before (race condition)
future = task.execute_async(...)
# ... later, outside thread ...
tokens_after = self._get_agent_token_usage(agent) # WRONG - other tasks may have completed
# After (fixed)
def _wrapped_sync_task_execution(...):
result = _task.execute_sync(...)
tokens_after = _self._get_agent_token_usage(_exec_data.agent) # Captured IN thread
return result, tokens_afterThis ensures tokens are captured at the exact moment each task completes, preventing interference from other concurrent tasks. |
| result = _wrapped_sync_task_execution() | ||
| future.set_result(result) | ||
| except Exception as e: | ||
| future.set_exception(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late-binding closure in _run_in_thread causes wrong task execution
The _run_in_thread function captures _wrapped_sync_task_execution and future via closure reference, not via default arguments. When multiple async tasks are created in a loop, each thread may execute with values from a later iteration due to Python's late-binding closure behavior. This causes threads to call the wrong task execution function and set results on the wrong Future object, leading to incorrect task execution, wrong results, or deadlocks when waiting for futures that never receive their results.
Changes Made
This fork adds comprehensive per-agent and per-task token tracking to CrewAI, providing detailed token usage metrics for each agent and task in a workflow.
Modified Files
src/crewai/types/usage_metrics.py- Added new data models:AgentTokenMetrics- Tracks tokens per agent (agent_name, total_tokens, prompt_tokens, completion_tokens, requests)TaskTokenMetrics- Tracks tokens per task (task_name, agent_name, total_tokens, prompt_tokens, completion_tokens, requests)WorkflowTokenMetrics- Aggregates all metrics with per_agent and per_task dictionariessrc/crewai/crews/crew_output.py- Enhanced CrewOutput:token_metrics: WorkflowTokenMetrics | Nonefield for detailed per-agent and per-task breakdownsrc/crewai/tasks/task_output.py- Enhanced TaskOutput:usage_metrics: TaskTokenMetrics | Nonefield for per-task token usagesrc/crewai/crew.py- Core tracking implementation:calculate_usage_metrics()to build per-agent token breakdown_get_agent_token_usage()helper to capture agent token state_attach_task_token_metrics()to calculate and attach per-task tokens_execute_tasks()to capture tokens before/after each task executionworkflow_token_metricsfield to Crew class_create_crew_output()to attach token_metrics to resultNew Features
Per-Agent Token Tracking
Each agent's total token usage is tracked separately, showing:
Per-Task Token Tracking
Each task's token usage is tracked with:
Accurate Attribution
Uses delta calculation (tokens_after - tokens_before) to accurately attribute tokens to specific tasks, even when multiple tasks are performed by the same agent.
Usage Example
Test Results
All tests pass with 100% accuracy:
Backward Compatibility
All changes are backward compatible:
result.token_usage(crew-level totals) continues to workBenefits
Note
Introduces detailed token accounting across the workflow with deltas captured per task and aggregated per agent.
AgentTokenMetrics,TaskTokenMetrics, andWorkflowTokenMetricsintypes/usage_metrics.pyTaskOutputwithusage_metricsandCrewOutputwithtoken_metrics_get_agent_token_usageand_attach_task_token_metrics, storing per-task metrics inworkflow_token_metricscrew.pyto wrap tasks, collecttokens_after, and attach metrics; adjusts async queues (pending_tasks/futures) to carry agent and token snapshotscalculate_usage_metrics()to buildper_agentbreakdown (and set workflow totals) from per-task data and manager metricsWritten by Cursor Bugbot for commit f62a5a9. This will update automatically on new commits. Configure here.