Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemGPT code is AAA+ unfortunately I cannot get it to work (no matter which LLM I try I cannot get it to work reliably) #1776

Open
distributev opened this issue Sep 23, 2024 · 5 comments
Assignees

Comments

@distributev
Copy link

Hi MemGPT Team,

Thank you for such a high quality codebase I'm pretty confident that, as LLMs will improve, MemGPT will become the "standard" between all products of its kind.

I would recommend MemGPT team to put in place a webpage similar with

https://aider.chat/docs/leaderboards/

where people would immediately see (and with high confidence) what kind of quality to expect from any of the available LLMs.

I want to ask the community if you were able to get MemGPT working in a "day to day" kind of way and, if yes, which LLMs are you using with and for which kind of scenario? (how long is your "persona" prompt? do you have instructions for the LLM to follow in your persona? How many? Are you using custom tools, if you are using custom tools how many and how complex?) If you are using "day to day" does MemGPT/LLM flawlesly work or you are used to see "stacktraces" and when you get a stacktrace you just click "run again" and it works next time?

I'm very curious to understand how (and if) people are using MemGPT.

I want to say that, with my own AI projects, I understood (before MemGPT) that it is incredible difficult to make LLMs to follow instructions no matter how well crafted and clear the prompt instructions are and it becomes even more difficult when the number of instructions to follow by the LLM increase and when you combine this situation with "function calling" ability of LLMs (where you can also have LLM to call functions and, similarly with instructions, the more functions you add the more confused the LLM becomes) => it becomes a very difficult problem, with the current ability of LLMs, to get anything more that "hello world" working. Even when the LLMs will follow the instructions the first time for the next two requests will not and will get a stacktrace (for anything more than "hello world").

Because of that I'm pretty sure what I will describe below it is happening because what the LLMs are (not) capable now and not because of MemGPT (which I already said has very well crafted source code).

I tried MemGPT two times the first time 6 months ago and gave up because for 90% of requests I was getting stacktraces and 10% of requests were working. For the past few days I tried again MemGPT and this time I also got familiar with the codebase. The situation is the same like 6 months ago.

With anthropic claude sonnet I could not get anything working.

With openai's gpt-4-1106-preview (which is advertised as 'featuring improved instruction following, JSON mode, reproducible outputs') I am able (from time to time) to get some requests processed but only when I start with --first --no-verify - even so subsequent requests starts to fail and cannot recover. I also tried other openai models and I could not get any to work.

Here is how I create my agent.

import os
import sys

# Determine the base directory two levels up from the current script
base_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..'))

# Add the path to the memgpt package
memgpt_path = os.path.join(base_dir, 'bkend-core-python', 'MemGPT')
sys.path.append(memgpt_path)

from memgpt import create_client
from memgpt.memory import ChatMemory
from memgpt.data_types import LLMConfig

from memgpt.functions.function_sets.extras import workspace_list_files, read_file_content, write_file_content, ctrl_c, ctrl_v

def main():
    # Create a `LocalClient`
    client = create_client()

    agent_name = "Ada"

    # Check if the agent already exists
    existing_agents = client.list_agents()
    ada_agent = next((agent for agent in existing_agents if agent.name == agent_name), None)

    if ada_agent is None:
        """Load text content from a file."""
        script_dir = os.path.dirname(os.path.abspath(__file__))
        parent_script_dir = os.path.dirname(script_dir)
        persona_file_path = os.path.join(script_dir, "persona_ada.txt")
        human_file_path = os.path.join(f"{parent_script_dir}/humans", "john.txt")

        with open(persona_file_path, 'r', encoding='utf-8') as file:
            persona_content = file.read().strip()

        with open(human_file_path, 'r', encoding='utf-8') as file:
            human_content = file.read().strip()

        persona_content = persona_content.replace("{workspace_folder_path}", os.path.join(script_dir, "workspace"))
        #print(f"human_content: {human_content}\npersona_content {persona_content}")

        # Create custom memory with the persona and human
        memory_ada = ChatMemory(persona=persona_content, human=human_content)

        # Create custom tools

        # recreate tools from scratch
        client.delete_tool('ada_simple_tool')
        client.delete_tool('ada_workspace_list_files')
        client.delete_tool('workspace_list_files')
        client.delete_tool('read_from_text_file')
        client.delete_tool('read_file_content')
        client.delete_tool('write_file_content')
        client.delete_tool('ctrl_c')
        client.delete_tool('ctrl_v')
               
        client.create_tool(workspace_list_files, name="workspace_list_files")
        client.create_tool(read_file_content, name="read_file_content")
        client.create_tool(write_file_content, name="write_file_content")
        client.create_tool(ctrl_c, name="ctrl_c")
        client.create_tool(ctrl_v, name="ctrl_v")

        # Define the LLM configuration for the gpt-4o-2024-08-06 model
        llm_config = LLMConfig(
            #o1-preview (rate limited not usable) 
            #o1-mini (rate limited not usable) 
            #gpt-4o-2024-08-06 (TOP model #2)
            #gpt-4-1106-preview (TOP model #1)
            #claude-3-sonnet-20240229
            model="claude-3-sonnet-20240229",
            #model_endpoint="https://api.openai.com/v1",
            model_endpoint="https://api.anthropic.com/v1",
            #model_endpoint_type="openai",
            model_endpoint_type="anthropic",
            #context_window=16384, #openai
            context_window=200000 #anthropic
        )

        # Create an agent
        ada_agent = client.create_agent(
            llm_config=llm_config,
            name="Ada",
            memory=memory_ada,
            system_prompt="memgpt_chat",
            tools=['workspace_list_files'],
            #include_base_tools=False,
        )

        print(f"Created (agent) {ada_agent.name} with ID {str(ada_agent.id)}")
    else:
        print(f"Nothing to do because (agent) {ada_agent.name} already exists.")

    # Retrieve and print memory information
    memory_info = client.get_agent_memory(agent_id=ada_agent.id)
    print(f"Agent Persona: {memory_info.core_memory.persona}")
    print(f"Agent Human: {memory_info.core_memory.human}")

if __name__ == "__main__":
   main()

There is no point in attaching here long stacktraces. I pretty confident I have the setup/configuration correctly done.

@sarahwooders
Copy link
Collaborator

Can you try again with 0.5.0? There should be a lot of bugfixes to the configuration for LLMs and embedding models now. Please re-open if you are still having issues.

@shivamatfigr
Copy link

I have been facing similar issues with claude sonnet. The "decision making" part of managing memory and tools seem to be quite unusable in the sense it cannot decide when and which functions to call. Once I tell it to update and fetch from core/archive memory it does it but that becomes less practical somehow for most usecases

Would be great if you guys can share some benchmarks and also best practices around making it work

@distributev have you given it a try more recently?

@distributev
Copy link
Author

@shivamatfigr From the MemGPT perspective I got pretty good results with gpt4o-mini which has also a very good price. As a basic assistant it works well - it still gives some stacktraces here and there related with memory updating but it is for sure usable (not like the other LLMs which I could not get them to work at all)

gpt4o-mini comes 5th on this leaderboard and notice the pricing also (vs other LLMs in the top - which anyway I could not get
them to work with MemGPT)

https://gorilla.cs.berkeley.edu/leaderboard.html

This leaderboard make sense for MemGPT because it tests the LLMs which are good at "function calling" - what MemGPT needs
the most.

As a basic assistant to keep your TODOs gpt4o-mini would work.

The limitation is that .... it is gpt4o-mini so you cannot do much more than "keeping your TODOs"

I tried to overcome this by giving gpt4o-mini tools to use when its limits are reached. One tool I gave it was "another smarter LLM to call using a command line interface" - interesting to play with but the system becomes too complex, prone to errors and a rabbit hole.

gpt4o-mini does not realize for itself "I'm too stupid for this, let's use the LLM cli tool to ask the other smarter LLM" I need to explicitly tell it "now use your LLM cli tool and ask Claude Sonnet" - which defeats the purpose.

@mattzh72
Copy link
Collaborator

@distributev quick question as we're looking into this - does gpt4o more reliably use the memory tools?

@mattzh72 mattzh72 reopened this Dec 18, 2024
@github-project-automation github-project-automation bot moved this from Done to Backlog in 🐛 MemGPT issue tracker Dec 18, 2024
@distributev
Copy link
Author

distributev commented Dec 18, 2024

Most of my testing I described above was done before the MemGPT to Letta project name change.

At that time I could get only gpt4o-mini working

gtp4o (directly from openai), claude sonnet (directly from anthropic), llama and few other LLMs from open router were all unusable and failing with the same stacktrace generated by the memory functions tools which MemGPT is using.

gtp4o-mini was the only model I could get working.

I tried letta once and it is working the same with gtp4o-mini but I did not re-tried letta with all the LLMs which
were failing before the project name change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants