-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Description
What happened?
Summary
GraphFlow workflows become unrecoverable when interrupted during agent transitions, leading to a corrupted state where no agents are ready to execute despite having remaining work.
Environment
- Framework:
autogen-agentchatwith GraphFlow - Python: 3.11
- Issue Type: State persistence & resume functionality
Problem Description
When a GraphFlow workflow is interrupted (e.g., via KeyboardInterrupt) during the transition between agents, the saved state becomes corrupted.
On resume, the workflow terminates immediately with:
Digraph execution is complete
—even though agents still have remaining work.
Steps to Reproduce
-
Create a GraphFlow with multiple agents in sequence.
-
Start the workflow and let the first agent complete successfully.
-
Interrupt the process (
Ctrl+C) during the transition to the next agent. -
Attempt to resume using:
team.load_state(saved_state) run_stream(task="continue")
Expected Behavior
- Workflow should resume from the next agent in the sequence and continue execution seamlessly.
Actual Behavior
- Workflow immediately terminates with
"Digraph execution is complete". - All agents receive the
"continue"message but none execute. - The ready queue is empty despite remaining work.
State File Analysis
The corrupted state shows:
{
"GraphManager": {
"remaining": {
"next_agent": { "next_agent": 1 },
// ... other agents with work remaining
},
"enqueued_any": {
"next_agent": { "next_agent": false },
// ... all agents show false
},
"ready": []
}
}- First agent completed – properly recorded in message history
- Next agent not enqueued – transition interrupted before coordination
- Workflow coordination lost – no agents ready despite remaining work
Root Cause
The GraphFlow coordination mechanism is interrupted before it can enqueue the next agent, leaving the system in an inconsistent state:
- Remaining work exists
- No agents are enqueued
- The workflow appears "complete" but is actually stuck
Which packages was the bug in?
Python Core (autogen-core)
AutoGen library version.
Python dev (main branch)
Other library version.
No response
Model used
No response
Model provider
None
Other model provider
No response
Python version
3.11
.NET version
None
Operating system
MacOS