Description
An issue to foster discussion on enabling graph based execution of team steps (workflow) in AutoGen AgentChat
The current API in AgentChat makes a lot of progress in terms of enabling chat-based execution of steps within an autonomous multi-agent application. This is based mostly on the BaseGroupChat team class.
However, it still does not support the ability to easily craft nodes or chains where the developer has clear control over exact execution flow. What this issue describes could be seem as a midpoint between what a simple chain approach like LangChain can do versus what a fully autonomous group (BaseGroupChat, SelectorGroupChat etc .. but with limited control) chat can do.
Note. This type of behavior (and in fact anything) can be expressed/implemented using the AutoGen low level Core API (but with alot of code). Part of the goal of this PR is to inform the design of a higher level api to enable this specific behavior with a better developer experience.
What is a Graph/Chain Execution Flow Pattern?
In its simplest form, a graph begins with a set of nodes and edges (similar to LangChain). A node is an independent processing unit that takes an input and provides one output. An edge defines transitions between nodes. For example, an edge between node A and B means that the output from node A goes to node B.
Important behaviors here:
-
A node has only one input and one output.
- It can have its instructions (aka system message)
- It may include internal branching and looping logic (e.g., it may actually be a team or a single agent in a RoundRobinChat that iterates until some termination condition)
- It must have a single output - all of its work must be summarized into a single output
- This is perhaps restrictive... but the output message could be a structure that can have all kinds of things in it: text, media or links to media, metadata, error states, intermediate artifacts. It probably would work as long as it is typed well. Having states in output can enable conditional edges (or gates) ..
-
A node must be independent - i.e., it must only be aware of its input and use that to produce its output. This way its context is free of pollution that can arise as other agents work (e.g., Agent B does not need to know that Agent A tried to write some code 10 times until it worked... it only needs the output of agent A either pass or fail).
- In some cases, if we really need it, nodes can have the ability to access a "global_context" variable
-
It should be easy to define nodes and edges as a DAG graph; e.g., a simple linear chain is one where the output from one agent goes to the next: A->B->C->D...
- Nodes can be validated before run (e.g., verify it's a DAG at minimum)
- Bonus:
- This should be extensible to any type of DAG
- Ability to specify parallelizable edges
- Support for task runners that can take a node and execute
-
There should be the concept of entry and exit points
- E.g., some nodes can be marked as start or end nodes
- Only a start node can be an entry point (i.e., first to run)
- Once an end node is reached, task is done
- E.g., some nodes can be marked as start or end nodes
Why is this a good idea?
Because it is such a common and intuitive thing for a framework to do - expressing the solution to a problem as a set of steps. The most common version being a simple sequence of independent nodes.
A Chain/Graph setup lets us progress from being a simple set of steps by allowing dynamic behavior within each step (e.g., group chats inside a node) while still controlling information flow on a high level.
Other benefits:
- The independence principle is valuable - isolating agent contexts can lead to cleaner, more predictable execution and easier debugging
- The single input/output constraint per node creates clarity and composability
- The DAG structure enables validation and potentially parallel execution paths
When is a flow based approach the wrong approach?
There are some scenarios where this can happen.
For example
- all nodes must be aware of what others have done. Here, a flow based approach is the wrong approach. E.g., .. multi-agent debate e.g., an agent arguing for, arguing against, observer/summarizer.
- the flow is extremely complex that it is best driven by an LLM logic as a graph compared to representing transitions as a graph.
Simple example
Data visualization
- Node A (LLM call) .. uses an LLM to generate code for a visualiztaion (plot a graph of ..)
- Node B (Linter) .. runs a linter and outputs a true/false
- Node C (Code execution and final result)
Graph looks like
- A -> B
- B -> C
- B --(condition: B output is false, max 3 tries) --> A
When this graph is run, A generates the code,
How is this different from current AgentChat?
AgentChat today is based on a group chat mechanism. Every agent sees every message sent (corrections welcome here). This violates the independence principle above.
What are ways forward?
SwarmGroupChat might come close to the behavior listed above but has the independence issue.
Some rough ideas:
- New Orchestrator that is only used to schedule and run a node
- Starts with the entry point
- Runs the entry point and gets the output
- Runs the next node in the graph and passes on the output from the previous node
- Ends once an end node has been reached
- A node may take in an agent or a group chat as args, but still respect the single output requirement
This issue is meant as a discussion to collate ideas into some future implementation.
Feedback and discussion welcome!
@husseinmozannar , @gagb , @afourney , @ekzhu , @jackgerrits