GraphFlow State Persistence Bug: Workflow Gets Stuck After Interruption During Agent Transitions

### What happened?

---

## Summary

GraphFlow workflows become unrecoverable when interrupted during agent transitions, leading to a corrupted state where no agents are ready to execute despite having remaining work.

---

## Environment

* **Framework**: `autogen-agentchat` with GraphFlow
* **Python**: 3.11
* **Issue Type**: State persistence & resume functionality

---

## Problem Description

When a GraphFlow workflow is interrupted (e.g., via `KeyboardInterrupt`) during the transition between agents, the saved state becomes corrupted.

On resume, the workflow terminates immediately with:

```
Digraph execution is complete
```

—even though agents still have remaining work.

---

## Steps to Reproduce

1. Create a GraphFlow with multiple agents in sequence.
2. Start the workflow and let the first agent complete successfully.
3. Interrupt the process (`Ctrl+C`) during the transition to the next agent.
4. Attempt to resume using:

   ```python
   team.load_state(saved_state)
   run_stream(task="continue")
   ```

---

## Expected Behavior

* Workflow should resume from the next agent in the sequence and continue execution seamlessly.

---

## Actual Behavior

* Workflow immediately terminates with `"Digraph execution is complete"`.
* All agents receive the `"continue"` message but none execute.
* The ready queue is empty despite remaining work.

---

## State File Analysis

The corrupted state shows:

```json
{
  "GraphManager": {
    "remaining": {
      "next_agent": { "next_agent": 1 },
      // ... other agents with work remaining
    },
    "enqueued_any": {
      "next_agent": { "next_agent": false },
      // ... all agents show false
    },
    "ready": []  
  }
}
```

* First agent completed – properly recorded in message history
* Next agent not enqueued – transition interrupted before coordination
* Workflow coordination lost – no agents ready despite remaining work

---

## Root Cause

The GraphFlow coordination mechanism is interrupted before it can enqueue the next agent, leaving the system in an inconsistent state:

* Remaining work exists
* No agents are enqueued
* The workflow appears "complete" but is actually stuck

---


### Which packages was the bug in?

Python Core (autogen-core)

### AutoGen library version.

Python dev (main branch)

### Other library version.

_No response_

### Model used

_No response_

### Model provider

None

### Other model provider

_No response_

### Python version

3.11

### .NET version

None

### Operating system

MacOS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GraphFlow State Persistence Bug: Workflow Gets Stuck After Interruption During Agent Transitions #7043

What happened?

Summary

Environment

Problem Description

Steps to Reproduce

Expected Behavior

Actual Behavior

State File Analysis

Root Cause

Which packages was the bug in?

AutoGen library version.

Other library version.

Model used

Model provider

Other model provider

Python version

.NET version

Operating system

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GraphFlow State Persistence Bug: Workflow Gets Stuck After Interruption During Agent Transitions #7043

Description

What happened?

Summary

Environment

Problem Description

Steps to Reproduce

Expected Behavior

Actual Behavior

State File Analysis

Root Cause

Which packages was the bug in?

AutoGen library version.

Other library version.

Model used

Model provider

Other model provider

Python version

.NET version

Operating system

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions