Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Roadmap for handling context overflow #156

Closed
5 of 7 tasks
yiranwu0 opened this issue Oct 8, 2023 · 27 comments
Closed
5 of 7 tasks

[Core] Roadmap for handling context overflow #156

yiranwu0 opened this issue Oct 8, 2023 · 27 comments
Assignees
Labels
enhancement New feature or request long context handling Compression to handle long context roadmap Issues related to roadmap of AutoGen

Comments

@yiranwu0
Copy link
Collaborator

yiranwu0 commented Oct 8, 2023

Any help is appreciated!

Thit task demands a considerable amount of effort. If you have insights, suggestions, or can contribute in any way, your help would be immensely valued.

Problem Description

(Continue from #9) Current LLM have limited context size / token limit (gpt3.5turbo: 4096, gpt4 8192, etc). Although the current max_token limit from OpenAI is sufficient for many tasks, the token limit will be always exceeded with the conversation running. autogen.Completion will raise this InvalidRequestError that indicates the context size is exceeded since autogen doesn’t have a way to handle long context sizes.

Potential Methods

  1. Compression: we can utilize LLMs to compress previous messages to reduce context size.
  2. Retrieve related history messages: we can retrieve the most related messages based on the latest message.
  3. Truncation: a simple way is keep the recent k messages and truncate all previous messages. We can also implement some truncation mechanisms, such as remove failed code executions.
  4. A mixture of methods above.

Some References

Compression & Truncation

  1. enhancement long context handling
  2. yiranwu0
  3. long context handling
  4. group chat long context handling
  5. long context handling

Retrieval

@yiranwu0 yiranwu0 self-assigned this Oct 8, 2023
@yiranwu0 yiranwu0 added long context handling Compression to handle long context enhancement New feature or request labels Oct 8, 2023
@sonichi sonichi added the roadmap Issues related to roadmap of AutoGen label Oct 8, 2023
@OrderAndCh4oS
Copy link

Probably a slight tangent, but I'm finding error stack traces to a major culprit of context overflow

@kazunator
Copy link

Would using something like Llama Index help?

@sonichi sonichi mentioned this issue Oct 12, 2023
@kazunator
Copy link

could this work? https://memgpt.ai/

@juanmacuevas
Copy link
Collaborator

juanmacuevas commented Oct 15, 2023

MemWalker processed long context intro a tree of summaries
Could this approach be applied for autogen? link to the paper

@sonichi
Copy link
Contributor

sonichi commented Oct 16, 2023

@Hacker0912 Are you interested in this topic?

@qidanrui
Copy link

qidanrui commented Oct 20, 2023

AutoGen is a great project! I'm very interested in how do you solve the context overflow.
When I was using AutoGen, sometimes I meet this situation:
捕获
Do you have any possible solution now? Thanks! @kevin666aa

@yiranwu0
Copy link
Collaborator Author

yiranwu0 commented Oct 20, 2023

AutoGen is a great project! I'm very interested in how do you solve the context overflow. When I was using AutoGen, sometimes I meet this situation: 捕获 Do you have any possible solution now? Thanks! @kevin666aa

@qidanrui Here is a experimental PR for compression: #131 . It would be great if you can check it out and test it!

Just found a potential good solution for compression and I will look into this: https://arxiv.org/abs/2310.06839

@qidanrui
Copy link

Thanks for sharing! @kevin666aa I'm so interested in the AutoGen project!
I have two other more general questions: 1. what is the difference between AutoGen and ChatArena? 2. Can we customize our own tool/agent except python code writing like what we can do in LangChain?

@aaronstevenson408
Copy link

aaronstevenson408 commented Oct 21, 2023

i also am having issue with using a mistral model , using textgen webui as the api host

openai.error.InvalidRequestError: This model maximum context length is 2048 tokens. However, your messages resulted in over 2165 tokens.

@MrXandbadas
Copy link

MrXandbadas commented Oct 21, 2023

could this work? https://memgpt.ai/

cpacker/MemGPT#65 (comment)

Edit:
Its been done! Fix your context issues by including a new bot. For the most up to date comment click the link but to save time here:
Please note this implementation replaces the coder from an example found in the examples folder

import os
import autogen
import asyncio
from absl import app, flags

config_list = [
    {
        'model': 'gpt-4',
        'api_key': os.getenv('OPENAI_API_KEY'),
    },
]

MEMGPT = True
if not MEMGPT:
    llm_config = {"config_list": config_list, "seed": 42}
    user_proxy = autogen.UserProxyAgent(
       name="User_proxy",
       system_message="A human admin.",
       code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
       human_input_mode="TERMINATE"
    )
    coder = autogen.AssistantAgent(
        name="Coder",
        llm_config=llm_config,
    )
    pm = autogen.AssistantAgent(
        name="Product_manager",
        system_message="Creative in software product ideas.",
        llm_config=llm_config,
    )

else:
    import memgpt.autogen.memgpt_agent as memgpt_autogen
    import memgpt.autogen.interface as autogen_interface 
    import memgpt.agent as agent
    import memgpt.system as system
    import memgpt.utils as utils
    import memgpt.presets as presets
    import memgpt.constants as constants
    import memgpt.personas.personas as personas
    import memgpt.humans.humans as humans
    from memgpt.persistence_manager import InMemoryStateManager, InMemoryStateManagerWithPreloadedArchivalMemory, InMemoryStateManagerWithFaiss
    
    llm_config = {"config_list": config_list, "seed": 42}
    user_proxy = autogen.UserProxyAgent(
       name="User_proxy",
       system_message="A human admin.",
       code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
    )

    interface = autogen_interface.AutoGenInterface()
    persistence_manager = InMemoryStateManager()
    memgpt_agent = presets.use_preset(presets.DEFAULT, 'gpt-4', personas.get_persona_text(personas.DEFAULT), humans.get_human_text(humans.DEFAULT), interface, persistence_manager)

    # MemGPT coder
    coder = memgpt_autogen.MemGPTAgent(
        name="MemGPT_coder",
        agent=memgpt_agent,
    )

    # non-MemGPT PM
    pm = autogen.AssistantAgent(
        name="Product_manager",
        system_message="Creative in software product ideas.",
        llm_config=llm_config,
    )

groupchat = autogen.GroupChat(agents=[user_proxy, coder, pm], messages=[], max_round=12)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

user_proxy.initiate_chat(manager, message="First send the message 'Let's go Mario!'")

@PriNova
Copy link

PriNova commented Oct 21, 2023

I suggest starting with the simplest implementations without external dependencies first and later building on these.
This means to use truncation so that the history has a sliding window set by users if custom models will be used.

Later on, the user can switch between different implementations how to handle context window through truncation, vector embeddings DB or MemGPT.

@rickyloynd-microsoft rickyloynd-microsoft self-assigned this Oct 21, 2023
@yiranwu0
Copy link
Collaborator Author

Hello, @MrXandbadas Thanks for the heads up! It's great to have a memGPT agent in AutoGen. It requires much effort to add memgpt, but it seems that we can work with people from memGPT to make it happen. I guess the first step would be to make memgpt a built-in agent in AutoGen. Then we can think about how users can switch with different options for context overflow.

@PriNova Thanks for the advice! Your suggestion of truncating history is brought up here #195.
I am making a PR to add a compression agent for the user to choose here #131.
There are two parts to this PR:

  1. A implementation where user can switch between two modes at a pre-set token limit:
    TERMINATE: terminate the process before RateLimitError is raised; "COMPRESS": call a compression agent to compress previous messages.
  2. A compression agent that does compression.

From the first part, it would be easy to add different ways to handle context later.
The compression agent is experimental but serves as a choice for now.

Please take a look at it if you are interested! @PriNova @MrXandbadas

@sonichi
Copy link
Contributor

sonichi commented Oct 22, 2023

Sounds good, except "the first step would be to make memgpt a built-in agent in AutoGen." Before we make it a built-in agent, it'll be helpful to demonstrate one good use case of memgpt-based agent in autogen.

@JonMike12341234
Copy link

I must be totally misunderstanding the request for "one good use case". Would this approach not give AutoGen Agents near limitless long term memory storage?

@sonichi
Copy link
Contributor

sonichi commented Oct 22, 2023

Just suggesting doing it step by step. Test-driven development.

@MrXandbadas
Copy link

Just suggesting doing it step by step. Test-driven development.

I completely agree. Testing it in an more robust setup would garner more fruitful insight as to if it was going to be beneficial for the system or just a hinderance in the continual prompt generation that allows these agents to function so flawlessly.

@ddwinhzy
Copy link

I think it's gonna be a great start.

@SDcodehub
Copy link
Collaborator

I am interested in this topic, is anyone working on this? I can help can work together with others

@yiranwu0
Copy link
Collaborator Author

yiranwu0 commented Nov 1, 2023

I am interested in this topic, is anyone working on this? I can help can work together with others

Hello @SDcodehub, thanks for your interest! Currently I am working on adding a compressible agent #443. It could be used as an interface for different types of compression and truncations.
I think it is a good start pointing to know what we are doing.
I am starting a draft for compressible groupchat in #497.
The next step will be to allow async management of history.

On the other hand, it is also possible to utilize existing framework like memGPT. memGPT is actively supporting autogen agents. As @sonichi pointed out, we need to "demonstrate one good use case of memgpt-based agent in autogen". It is not hard to add a memGPT agent, but how to modify it to serve as a group memory requires more effort and thinking.

@GregorD1A1
Copy link
Collaborator

Hey guys!

The lack of possibility to see exact prompt which agent gets to a context and lack of ability of managing agent's contexts are the main problems of autogen in my opinion. Main reason for it is not even overflow of context window - the main reasons is the LLMs working much worse when had too long context with a lot of useless noise, and also generate unnecesary costs.

Maybe you remember, lack of input prompt visibility was big problem with Langchain, until they did a Langsmith. With autogen we have same problem again. In my opinion, we can't talk about any serious AI development if we can't see and edit input prompt of LLMs.

What do you think about it? Is such features of editing context (as summarizing or removing old messages) will be available in near future? Maybe there are already solutions exist I don't know about?

Cheers!

@ekzhu ekzhu changed the title Roadmap for handling context overflow [Core] Roadmap for handling context overflow Dec 29, 2023
@sonichi
Copy link
Contributor

sonichi commented Jan 1, 2024

I'm quite excited at #1091 by @rickyloynd-microsoft. It makes the teachability a composable capability to any conversable agent. More generally, the same mechanism may be used for solving other longstanding issues like long context handling and allowing other interesting capabilities to be defined. I like the extensibility and the composability of this approach. Reviews are welcome.
cc @kevin666aa

@sgjohnson1981
Copy link

sgjohnson1981 commented Jan 29, 2024

I'm quite excited at #1091 by @rickyloynd-microsoft. It makes the teachability a composable capability to any conversable agent. More generally, the same mechanism may be used for solving other longstanding issues like long context handling and allowing other interesting capabilities to be defined. I like the extensibility and the composability of this approach. Reviews are welcome. cc @kevin666aa

@sonichi
So does @rickyloynd-microsoft 's PR add a feature that eliminates the need for MemGPT or other context length solutions? I can't tell from "...mechanism may be used..." Is this mechanism used now for context length, or will that be implemented later?

@rickyloynd-microsoft
Copy link
Contributor

I'm quite excited at #1091 by @rickyloynd-microsoft. It makes the teachability a composable capability to any conversable agent. More generally, the same mechanism may be used for solving other longstanding issues like long context handling and allowing other interesting capabilities to be defined. I like the extensibility and the composability of this approach. Reviews are welcome. cc @kevin666aa

@sonichi So does @rickyloynd-microsoft 's PR add a feature that eliminates the need for MemGPT or other context length solutions? I can't tell from "...mechanism may be used..." Is this mechanism used now for context length, or will that be implemented later?

Teachability is just one capability added through this new mechanism, and teachability is not designed to compress context or memorize general things like MemGPT. But other capabilities (like MemGPT or other ways of handling long context) could be added through this general capability-addition mechanism.

@ekzhu
Copy link
Collaborator

ekzhu commented Mar 13, 2024

Shall we close this issue as several recent PRs related to long context handling have merged. @kevin666aa

@yiranwu0
Copy link
Collaborator Author

Yes, thanks!

@JingPush
Copy link

Hi, I'm working on conversable agent flow with autogen. and really wants to know the status of handling context window length and truncate chat history.

I read the above conversation and have a few questions?

  1. Is the status already completed and released?
  2. Is there any docs or paper from autogen explaining how this is being handled specifically?

It would be really helpful if you can answer the question!

@ekzhu
Copy link
Collaborator

ekzhu commented Apr 25, 2024

@JingPush current we use this: https://microsoft.github.io/autogen/docs/topics/long_contexts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request long context handling Compression to handle long context roadmap Issues related to roadmap of AutoGen
Projects
Status: Done
Development

No branches or pull requests