-
Notifications
You must be signed in to change notification settings - Fork 372
Description
Hi guys! Today I’d like to share my fast-agent testing experience with the agents_as_tools PR #515/#458 high-parallel feature.
I ran tests in a real production environment with 3 up to ~150 agents running in parallel. This revealed several important insights and issues that I need to address.
1. OpenAI model limits: context length and 128 tools in parallel)
During stress tests I hit two kinds of hard limits in gpt-5-mini:
Sample 1 – context length exceeded
Error details: openai request failed for model 'gpt-5-mini' (code: context_c_exceeded)
(status=400): Error code: 400 - {'error': {'message': 'Input tokens exceed the configured limit
of 272000 tokens. Your messages resulted in 279940 tokens. Please reduce the length of the
messages.', 'type': 'invalid_request_error', 'param': 'messages', 'code':
'context_length_exceeded'}}
So the accumulated messages plus tool calls can easily blow past the 272k token limit when you run large batches of child agents in parallel.
How I plan to fix Sample 1
- Make child agents write their full results directly to external systems (DB, chats, tickets, storage) instead of returning large payloads to orchestator through the LLM.
- Return to the orchestrator only a small status object, for example:
{ ok: true/false, error, result_ref }. - Optional: Add better trimming of message history in fast-agent so long-running batches cannot accumulate unbounded context.
Sample 2 – tool_calls array above max length
Error details: openai request failed for model 'gpt-5-mini' (code: array_above_max_length)
(status=400): Error code: 400 - {'error': {'message': "Invalid 'messages[4].tool_calls': array too
long. Expected an array with maximum length 128, but got an array with length 147 instead.", 'type':
'invalid_request_error', 'param': 'messages[4].tool_calls', 'code': 'array_above_max_length'}}
This hits the hard limit of 128 parallel tool calls. Even with prompt constraints, the model can exceed it, so fast-agent likely needs stricter server-side enforcement.
How I plan to fix Sample 2
-
Add a hard cap in fast-agent so a single request cannot contain more than 128 tool calls.
-
If the model generates more, fast-agent will either:
- split the work into multiple smaller batches, or
- trim extra tool calls before sending the request to OpenAI and report this in debug logs.
-
Optionally add validation and a clear error message if a caller tries to schedule more than 128 tools in one run.
2. Child agent cleanup
After tasks complete, child-agent records currently remain in the dashboard. The old cleanup logic removed too many items at once, so it was disabled. A refactor is needed so each child agent is removed individually after completing its work.
3. Dashboard UI overload
More than ~20 child-agent rows overloads the UI and causes blinking (especially in VS Code).
Planned improvement:
- Show up to 20 rows normally.
- Collapse everything above that into a summarized block.
- Allow expanding only when needed.
This will keep the UI clean during large parallel launches.
Short update
A more efficient result-handling pattern is being explored: letting child agents publish full outputs directly to external systems (DB, chats, tickets) and returning only a small status payload to the orchestrator. This approach avoids context bloat and prevents hitting token limits during massive parallel execution.