handle context size overflow in AssistantAgent #9

sonichi · 2023-08-01T03:23:34Z

microsoft/FLAML#1098, microsoft/FLAML#1153, microsoft/FLAML#1158 each addresses this in some specialized way. Can we integrate these ideas into a generic solution and make AssistantAgent able to overcome this limitation out of the box?

Tasks

Give feedback

The text was updated successfully, but these errors were encountered:

yiranwu0 · 2023-08-02T07:00:56Z

I will handle this problem in microsoft/FLAML#1153. The problem should be in generate_reply, when it returns extra long messages. My current plan include the following functionalities:

use tiktoken for more accurate count of tokens, add a static function that checks token left given model and previous messages.
Allow user to pass in a predefined output limit.
when the generated output (for example, from code execution) passes the max token allowed or the user predefined error, it will return a long result error.

@thinkall has implemented the tiktoken count in microsoft/FLAML#1158. Should I try to fix this concurrently?

sonichi · 2023-08-02T14:55:14Z

I will handle this problem in microsoft/FLAML#1153. The problem should be in generate_reply, when it returns extra long messages. My current plan include the following functionalities:

use tiktoken for more accurate count of tokens, add a static function that checks token left given model and previous messages.

Allow user to pass in a predefined output limit.

when the generated output (for example, from code execution) passes the max token allowed or the user predefined error, it will return a long result error.

@thinkall has implemented the tiktoken count in microsoft/FLAML#1158. Should I try to fix this concurrently?

Your proposal can solve part of the problem. It does the check on the sender's side in case the receiver requests a length limit.
There can be other alternatives:

The receiver requests that when the msg is longer than the threshold, the sender sends a part of the msg and they have protocol to deal with the remaining part. Continual learning via LearningAgent and TeachingAgent FLAML#1098 and Add RetrieveChat FLAML#1158 are examples of such.
The receiver doesn't request check on the sender's side. It performs compression on the receiver's side. For example, it can employ agents in Continual learning via LearningAgent and TeachingAgent FLAML#1098 to do so. Even when the check on the sender is requested, some compression can still be done at the receiver's side to make room for future msgs.

It'll be good to figure out what we want to support and have a comprehensive design.
Could you discuss with @thinkall and @LeoLjl ? You are in the same time zone. Once you have a proposal, @qingyun-wu and I can go over it.

yiranwu0 · 2023-08-02T16:32:22Z

Sure, I will discuss will @thinkall and @LeoLjl about it.

I just updated microsoft/FLAML#1153 to allow user to set a pre-defined token limit for outputs from code or function call, this is a different task and I think is a different task from handling token_limit in oai_reply.

yiranwu0 · 2023-08-05T06:55:21Z

@sonichi @qingyun-wu Here is my proposed plan:

On AssistantAgent:
Add a parameter on_token_limit from ["Terminate", "Compress" ]. We would if token limit is reached before oai.create is called, if set to "terminate", we would terminate the message. If set to compress, we would use a compress agent to compress previous messages and prepare for future conversations (we can also set a threshold like 80% of max token to start an async agent). I read that openai is doing summary for previous messages if it is too long.

On UserProxyAgent (I already added this in microsoft/FLAML#1153):
Allow user to specify the "auto_reply_token_limit". default to -1 (no limit). When auto_reply_token_limit > 0 and the token count from auto reply (code execution or function call) exceeds the limit, the output will be replaced with an error message. This can let users prevent unexpected cases where the output from code execution or functions calls overflowed.

From the two changes above, all 3 generate_reply cases are addressed: oai_reply, code execution and function call.
I am thinking of general tasks like problem-solving. @BeibinLi likes the "compression" and "terminate" approach.

For tasks that involve databases and has a large consumption on tokens, like answer questions given a long text, or search for data in a database, I think we need special design targeting at those applications.

sonichi · 2023-08-05T18:11:02Z

The proposal is a good start. I like the design that covers two options: deal with token limit after/before a reply is made.
I think we can generalize this design:

For each auto reply method, we add an optional argument token_limit to let the method know the token limit for each reply. Allow it to be either a user-specified constant or an auto-decided number. The method is responsible for handling that constraint. This includes the retrieval-based auto reply, such as the one in RetrieveChat.
For oai_reply, we catch the token limit error, and return (False, None) when the error happens. That will yield the chance of finalizing the reply and let the next registered method decide the reply. Then, we can register the compressor method to be processed after the oai_reply yields.

yiranwu0 · 2023-08-08T12:41:49Z

On second thought, I don’t think we need to pass a token_limit argument. Currently for function and code execution, I use a class variable “auto_reply_token_limit” to customize behavior when limit is reached. When a new agent is overloading, they can employ this variable, or just create a new class variable.

sonichi · 2023-08-08T14:10:01Z

Should the sender tell the receiver the token limit? "token_limit" and ways to handle token_limit should be separated. "token_limit" is a number that should be sent by the sender. Maybe we can make that a field in the message. The way to handle token_limit is decided in the auto reply method.

yiranwu0 · 2023-08-10T04:10:29Z

I have a few questions when looking at the

In receive function, it calls generate reply without passing in messages: self.generate_reply(sender=sender), so the message will be None. When registered methods such as generate_oai_reply is called, message will be None and it takes out the pre-stored messages:

        if messages is None:
            messages = self._oai_messages[sender]

It seems that this message argument is not used. When would this be used?
One possible usage: when generate_reply is called individually.

the context argument passed to register_auto_reply seem more appropriate to be rename to reply_config?
In oai_reply it is converted to llm_config and in code execution it is converted to code_execution_config. In other reply methods it is not used. It seems that "context" can be a field in message from oai and also "content" is a field in message.

sonichi · 2023-08-10T14:16:52Z

I have a few questions when looking at the

In receive function, it calls generate reply without passing in messages: self.generate_reply(sender=sender), so the message will be None. When registered methods such as generate_oai_reply is called, message will be None and it takes out the pre-stored messages:
        if messages is None:
            messages = self._oai_messages[sender]
It seems that this message argument is not used. When would this be used? One possible usage: when generate_reply is called individually.

the context argument passed to register_auto_reply seem more appropriate to be rename to reply_config?
In oai_reply it is converted to llm_config and in code execution it is converted to code_execution_config. In other reply methods it is not used. It seems that "context" can be a field in message from oai and also "content" is a field in message.

Good questions. Regarding 1, yes messages will be used when generate_reply is called individually. We can revise the calling usage in receive function to make it pass messages, to avoid this confusion.
Regarding 2, we can rename it into config if we want to avoid the confusion. One thing to note is that this variable could be updated in the reply function to maintain some state. I wanted to use it in other methods too but haven't done the refactoring. @ekzhu is it OK to rename context into config in generate_reply()?

Integrate Mem0 for providing long-term memory for AI Agents

Imrpoving devs

* first draft workflow for understandings * breaking change WIP * breaking SK upgrade * update compiles * add error handling/logging * change network timeout * mermory refactor * simple dev understanding * error handling for building understanding

sonichi mentioned this issue Aug 1, 2023

Agent improvement microsoft/FLAML#1032

Open

sonichi added the enhancement New feature or request label Aug 1, 2023

sonichi mentioned this issue Aug 8, 2023

Add RetrieveChat microsoft/FLAML#1158

Merged

3 tasks

yiranwu0 mentioned this issue Aug 17, 2023

Handle context size overflow: add compression agent, allow pre-defined token_limit for function_call and code_execution. microsoft/FLAML#1153

Closed

3 tasks

sonichi transferred this issue from microsoft/FLAML Sep 23, 2023

yiranwu0 mentioned this issue Oct 8, 2023

[Core] Roadmap for handling context overflow #156

Closed

thinkall closed this as completed Jun 18, 2024

skzhang1 pushed a commit to skzhang1/autogen that referenced this issue Aug 26, 2024

Merge pull request microsoft#9 from mem0ai/user/dev/add-mem0-docs

24774d6

Integrate Mem0 for providing long-term memory for AI Agents

jackgerrits pushed a commit that referenced this issue Oct 2, 2024

Merge pull request #9 from microsoft/imrpoving-devs

3ed78a0

Imrpoving devs

jackgerrits added a commit that referenced this issue Oct 2, 2024

Update README.md (#9)

1a9dddb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle context size overflow in AssistantAgent #9

handle context size overflow in AssistantAgent #9

sonichi commented Aug 1, 2023 •

edited

Loading

Tasks

yiranwu0 commented Aug 2, 2023 •

edited

Loading

sonichi commented Aug 2, 2023

yiranwu0 commented Aug 2, 2023

yiranwu0 commented Aug 5, 2023

sonichi commented Aug 5, 2023

yiranwu0 commented Aug 8, 2023

sonichi commented Aug 8, 2023

yiranwu0 commented Aug 10, 2023 •

edited

Loading

sonichi commented Aug 10, 2023

handle context size overflow in AssistantAgent #9

handle context size overflow in AssistantAgent #9

Comments

sonichi commented Aug 1, 2023 • edited Loading

Tasks

yiranwu0 commented Aug 2, 2023 • edited Loading

sonichi commented Aug 2, 2023

yiranwu0 commented Aug 2, 2023

yiranwu0 commented Aug 5, 2023

sonichi commented Aug 5, 2023

yiranwu0 commented Aug 8, 2023

sonichi commented Aug 8, 2023

yiranwu0 commented Aug 10, 2023 • edited Loading

sonichi commented Aug 10, 2023

sonichi commented Aug 1, 2023 •

edited

Loading

yiranwu0 commented Aug 2, 2023 •

edited

Loading

yiranwu0 commented Aug 10, 2023 •

edited

Loading