Re-evaluating Internal Message Handling in SocietyOfMindAgent (v0.4+)

### What happened?


<details>
<summary>old message : First start with SocietyOfMindAgent leaking inner monologues to outter team</summary>

**Describe the bug**
When using a `SocietyOfMindAgent` inside a `GroupChat`, messages from the inner team (e.g., inner agents like `agent1`, `agent2`) are exposed to the outer `GroupChat` stream, rather than being contained within the internal reasoning. This leads to unexpected messages being surfaced to outer-level agents and logic, breaking isolation and causing potential routing/termination issues.

**To Reproduce**
Run the following minimal example:

```python
import asyncio
from autogen_agentchat.agents import AssistantAgent, SocietyOfMindAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import MaxMessageTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.ui import Console

async def main():
    model_client = OpenAIChatCompletionClient(model="gpt-4o")

    agent1 = AssistantAgent("agent1", model_client=model_client, system_message="You are a writer.")
    agent2 = AssistantAgent("agent2", model_client=model_client, system_message="You are a critic.")

    inner_team = RoundRobinGroupChat(
        participants=[agent1, agent2],
        termination_condition=MaxMessageTermination(2)
    )

    society_agent = SocietyOfMindAgent("society", team=inner_team, model_client=model_client)

    outer_agent = AssistantAgent("translator", model_client=model_client, system_message="Translate to Spanish.")

    team = RoundRobinGroupChat(
        participants=[society_agent, outer_agent],
        termination_condition=MaxMessageTermination(2)
    )

    await Console(team.run_stream(task="Write a short story."))

asyncio.run(main())
```

💥 Expected Console output shows all intermediate messages from the inner team:
```
user: Write a short story.
agent1: Once upon a time...
agent2: Needs more drama.
society: Here's the revised version.
translator: Aquí está la historia.
```
💥 Real  Console output shows all intermediate messages from the inner team (Termination is do not work expected...):
```
user: Write a short story.
agent1: Once upon a time... 
society: Here's the revised version.
```

But expected behavior:
```
user: Write a short story.
>>> (inner, it could be do not showing) agent1: Once upon a time...
>>> (inner, it could be do not showing) agent2: Needs more drama.
society: Here's the revised version.
translator: Aquí está la historia.
```

**Expected behavior**

The `SocietyOfMindAgent` should:

- ✅ Internally run its team and return a single `Response`
- ✅ Prevent inner messages from leaking into the outer `GroupChat` stream
- ⚙️ Optionally log intermediate messages to `Console`, but **not** expose them as `ChatMessage`s to the rest of the system

---

**Additional context**

This behavior breaks **message encapsulation**, which is particularly problematic when using nested teams.

These changes would make `SocietyOfMindAgent` much more robust and suitable for **nested orchestration** scenarios.

---

**Question**

Is this behavior (inner messages leaking into the outer `GroupChat`) intended?  
Or is this something that might be considered a design oversight or bug?

Happy to propose a patch once the expected behavior is clarified. 🙌
</details>


I was able to investigate the core behavior of `SocietyOfMindAgent` more deeply. Since the original design intentions were not fully clear from the documentation alone, I compared the current implementation to the earlier version from v0.2.

Through this comparison, I confirmed that the current behavior introduces four **functional regressions** that did not exist before. Additionally, I identified four more **architectural concerns** introduced in recent versions.

I've summarized them in the table below for review. I would really appreciate any feedback or validation on this analysis.

---

🔍 SocietyOfMindAgent: Design Issues and Historical Comparison (v0.2 vs v0.4+)

### ✅ P1–P4 Regression Issue Table (Updated with Fixes in PR #6142)

| ID  | Description | Current v0.4+ Issue | Resolution in PR #6142 | Was it a problem in v0.2? | Notes |
|-----|-------------|----------------------|--------------------------|----------------------------|-------|
| **P1** | `inner_messages` leaks into outer team termination evaluation | `Response.inner_messages` is appended to the outer team's `_message_thread`, affecting termination conditions. Violates encapsulation. | ✅ `inner_messages` is excluded from `_message_thread`, avoiding contamination of outer termination logic. | ❌ No | Structural boundary is now enforced |
| **P2** | Inner team does not execute when outer message history is empty | In chained executions, if no new outer message exists, no task is created and the inner team is skipped entirely | ✅ Detects absence of new outer message and reuses the previous task, passing it via a handoff message. This ensures the inner team always receives a valid task to execute | ❌ No | The issue was silent task omission, not summary failure. Summary succeeds as a downstream effect |
| **P3** | Summary LLM prompt is built from external input only | Prompt is constructed using external message history, ignoring internal reasoning | ✅ Prompt construction now uses `final_response.inner_messages`, restoring internal reasoning as the source of summarization | ❌ No | Matches v0.2 internal monologue behavior |
| **P4** | External input is included in summary prompt (possibly incorrectly) | Outer messages are used in the final LLM summarization prompt | ✅ Resolved via the same fix as P3; outer messages are no longer used for summary | ❌ No | Redundant with P3, now fully addressed |



ID | Description | Current v0.4+ Issue | Suggested Fix | Was it a problem in v0.2? | Notes
-- | -- | -- | -- | -- | --
E1 | Fragile count <= len(task) logic in stream parsing | Skips a fixed number of messages assuming they are tasks. Breaks with team structure changes. | Use explicit criteria like source == "user" to filter task messages | ❌ No | v0.2 had no streaming/yield logic
E2 | Streaming chunks (e.g. ModelClientStreamingChunkEvent) ambiguity | Some events are streamed but not stored — unclear if this is intentional | Add comments to clarify intent. Maintain current behavior. | ❌ No | v0.2 had no streaming structure. Keep current but document clearly.
E3 | Ambiguous task/message boundary | Outer tasks and inner messages are mixed conceptually | Clarify roles using message types or consistent tagging (e.g. source) | ❌ No | v0.2 handled outer input as "User" consistently. Just verify that continues to be true.
E4 | reset() may not run if exception occurs | If run_stream() fails mid-execution, reset() is skipped → potential team state corruption | Wrap reset() inside a finally block for guaranteed cleanup | ⚠️ Partially | Same logic in v0.2; lacks finally, so it's not always guaranteed either

---
Error?? Need to more information 
ID | Description | Current v0.4+ Issue | Suggested Fix | Was it a problem in v0.2? | Notes
-- | -- | -- | -- | -- | --
P5 | reset() on inner team affects outer team state | SocietyOfMindAgent calls await self._team.reset(), which resets shared team instances, unintentionally clearing the outer team's state | Ensure inner team is a separate instance (e.g., deep copy), or isolate reset() behavior to avoid cross-team interference | DoNot Check | DoNot Check however It's Error. Q.E.D.
>>>Maybe... it's okay. Model context remember their context, so it's right behavior
---

If there are no objections to my conclusions, I would like to open a **DRAFT PR** to begin addressing these issues.

Since this agent is critical for a production use case I'm working on, I’m highly motivated to contribute toward improving its reliability.

Looking forward to your feedback—thank you!


### Which packages was the bug in?

Python AgentChat (autogen-agentchat>=0.4.0)

### AutoGen library version.

Python dev (main branch)

### Other library version.

_No response_

### Model used

_No response_

### Model provider

None

### Other model provider

_No response_

### Python version

None

### .NET version

None

### Operating system

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-evaluating Internal Message Handling in SocietyOfMindAgent (v0.4+) #6123

What happened?

✅ P1–P4 Regression Issue Table (Updated with Fixes in PR #6142)

Which packages was the bug in?

AutoGen library version.

Other library version.

Model used

Model provider

Other model provider

Python version

.NET version

Operating system

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ID	Description	Current v0.4+ Issue	Resolution in PR #6142	Was it a problem in v0.2?	Notes
P1	`inner_messages` leaks into outer team termination evaluation	`Response.inner_messages` is appended to the outer team's `_message_thread`, affecting termination conditions. Violates encapsulation.	✅ `inner_messages` is excluded from `_message_thread`, avoiding contamination of outer termination logic.	❌ No	Structural boundary is now enforced
P2	Inner team does not execute when outer message history is empty	In chained executions, if no new outer message exists, no task is created and the inner team is skipped entirely	✅ Detects absence of new outer message and reuses the previous task, passing it via a handoff message. This ensures the inner team always receives a valid task to execute	❌ No	The issue was silent task omission, not summary failure. Summary succeeds as a downstream effect
P3	Summary LLM prompt is built from external input only	Prompt is constructed using external message history, ignoring internal reasoning	✅ Prompt construction now uses `final_response.inner_messages`, restoring internal reasoning as the source of summarization	❌ No	Matches v0.2 internal monologue behavior
P4	External input is included in summary prompt (possibly incorrectly)	Outer messages are used in the final LLM summarization prompt	✅ Resolved via the same fix as P3; outer messages are no longer used for summary	❌ No	Redundant with P3, now fully addressed

ID	Description	Current v0.4+ Issue	Suggested Fix	Was it a problem in v0.2?	Notes
E1	Fragile count <= len(task) logic in stream parsing	Skips a fixed number of messages assuming they are tasks. Breaks with team structure changes.	Use explicit criteria like source == "user" to filter task messages	❌ No	v0.2 had no streaming/yield logic
E2	Streaming chunks (e.g. ModelClientStreamingChunkEvent) ambiguity	Some events are streamed but not stored — unclear if this is intentional	Add comments to clarify intent. Maintain current behavior.	❌ No	v0.2 had no streaming structure. Keep current but document clearly.
E3	Ambiguous task/message boundary	Outer tasks and inner messages are mixed conceptually	Clarify roles using message types or consistent tagging (e.g. source)	❌ No	v0.2 handled outer input as "User" consistently. Just verify that continues to be true.
E4	reset() may not run if exception occurs	If run_stream() fails mid-execution, reset() is skipped → potential team state corruption	Wrap reset() inside a finally block for guaranteed cleanup	⚠️ Partially	Same logic in v0.2; lacks finally, so it's not always guaranteed either

Re-evaluating Internal Message Handling in SocietyOfMindAgent (v0.4+) #6123

Description

What happened?

✅ P1–P4 Regression Issue Table (Updated with Fixes in PR #6142)

Which packages was the bug in?

AutoGen library version.

Other library version.

Model used

Model provider

Other model provider

Python version

.NET version

Operating system

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions