Skip to content

my fast-agent testing experience with the **agents_as_tools** high-parallel feature. #520

@iqdoctor

Description

@iqdoctor

Hi guys! Today I’d like to share my fast-agent testing experience with the agents_as_tools PR #515/#458 high-parallel feature.

I ran tests in a real production environment with 3 up to ~150 agents running in parallel. This revealed several important insights and issues that I need to address.

1. OpenAI model limits: context length and 128 tools in parallel)

During stress tests I hit two kinds of hard limits in gpt-5-mini:

Sample 1 – context length exceeded

Error details: openai request failed for model 'gpt-5-mini' (code: context_c_exceeded)
(status=400): Error code: 400 - {'error': {'message': 'Input tokens exceed the configured limit
of 272000 tokens. Your messages resulted in 279940 tokens. Please reduce the length of the
messages.', 'type': 'invalid_request_error', 'param': 'messages', 'code':
'context_length_exceeded'}}

So the accumulated messages plus tool calls can easily blow past the 272k token limit when you run large batches of child agents in parallel.

How I plan to fix Sample 1

  • Make child agents write their full results directly to external systems (DB, chats, tickets, storage) instead of returning large payloads to orchestator through the LLM.
  • Return to the orchestrator only a small status object, for example: { ok: true/false, error, result_ref }.
  • Optional: Add better trimming of message history in fast-agent so long-running batches cannot accumulate unbounded context.

Sample 2 – tool_calls array above max length

Error details: openai request failed for model 'gpt-5-mini' (code: array_above_max_length)
(status=400): Error code: 400 - {'error': {'message': "Invalid 'messages[4].tool_calls': array too
long. Expected an array with maximum length 128, but got an array with length 147 instead.", 'type':
'invalid_request_error', 'param': 'messages[4].tool_calls', 'code': 'array_above_max_length'}}

This hits the hard limit of 128 parallel tool calls. Even with prompt constraints, the model can exceed it, so fast-agent likely needs stricter server-side enforcement.

How I plan to fix Sample 2

  • Add a hard cap in fast-agent so a single request cannot contain more than 128 tool calls.

  • If the model generates more, fast-agent will either:

    • split the work into multiple smaller batches, or
    • trim extra tool calls before sending the request to OpenAI and report this in debug logs.
  • Optionally add validation and a clear error message if a caller tries to schedule more than 128 tools in one run.

2. Child agent cleanup

After tasks complete, child-agent records currently remain in the dashboard. The old cleanup logic removed too many items at once, so it was disabled. A refactor is needed so each child agent is removed individually after completing its work.

3. Dashboard UI overload

More than ~20 child-agent rows overloads the UI and causes blinking (especially in VS Code).

Planned improvement:

  • Show up to 20 rows normally.
  • Collapse everything above that into a summarized block.
  • Allow expanding only when needed.

This will keep the UI clean during large parallel launches.


Short update

A more efficient result-handling pattern is being explored: letting child agents publish full outputs directly to external systems (DB, chats, tickets) and returning only a small status payload to the orchestrator. This approach avoids context bloat and prevents hitting token limits during massive parallel execution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions