Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Auto mode rework #48

Draft
wants to merge 15 commits into
base: master
Choose a base branch
from
Draft

Conversation

nitanmarcel
Copy link
Contributor

@nitanmarcel nitanmarcel commented Sep 12, 2024

Checklist

- [x] Rewrite most of the code
- [X] Re-Implement OpenAI
- [ ] Re-implement Anthropic
- [ ] Re-Implement Llama
- [ ] Re-Implement Bedrock
- [ ] Re-Implement Groq
- [ ] Re-Implement Google
- [ ] Re-Implement NousResearch
- [ ] Implement chromadb (Add radare documentation as knowledge base, promt the ai to return "memories" that can be saved and used later.

  • Implement litellm
  • Implement llama_cpp
  • Drop chroma db which is 200 MB in size with sqlite which 17KB in size
  • Chunk Messages to avoid reaching token limit
  • Support more function calling models #41
  • Optimizations
  • Add documentation on adding custom functions
  • More functions? Like internet + now we could support external functions out of the box (of course,we still have to implement the api.
  • And maybe more?

@nitanmarcel
Copy link
Contributor Author

@dnakov :)

@trufae
Copy link
Contributor

trufae commented Sep 12, 2024

Omg this pr couldnt be bigger

@nitanmarcel
Copy link
Contributor Author

Omg this pr couldnt be bigger

hihi. I'm not done

@nitanmarcel
Copy link
Contributor Author

Can someone tell me what's a token limit? Because I don't get any =)))

Screenshot from 2024-09-13 00-06-18

@nitanmarcel
Copy link
Contributor Author

Oh, this one -_-. I was close enough tho

openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, you requested 8226 tokens (3983 in the messages, 147 in the functions, and 4096 in the
completion). Please reduce the length of the messages, functions, or completion.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}```

@dnakov
Copy link
Collaborator

dnakov commented Sep 12, 2024

@nitanmarcel before you get too far..
Re-Implement OpenAI
Re-implement Anthropic
Re-Implement Llama
Re-Implement Bedrock
Re-Implement Groq
Re-Implement Google
Re-Implement NousResearch

All of these can just be served via a single litellm call like in ui/chat.py

@nitanmarcel
Copy link
Contributor Author

nitanmarcel commented Sep 12, 2024

@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch

All of these can just be served via a single litellm call like in ui/chat.py

I think @trufae said something about not wanting litellm implemented. Which is extra fine by me since I can transform the process function in a decorator and have it hold an argument to pre-process the response

@nitanmarcel
Copy link
Contributor Author

nitanmarcel commented Sep 12, 2024

Would be something like:

@process_response(processor=function_to_convert_response)
def unsupported_model_call(...):```

@nitanmarcel
Copy link
Contributor Author

Would be something like:

@process_response(processor=function_to_convert_response)
def unsupported_model_call(...):```

Tho it implies that it supports the same tool format. Can create a pre_processor argument for that too

@nitanmarcel
Copy link
Contributor Author

nitanmarcel commented Sep 12, 2024

Can someone tell me what's a token limit? Because I don't get any =)))

Screenshot from 2024-09-13 00-06-18

Anyway, I have this to figure out. The chunking of big resilts works pretty fine almost

@dnakov
Copy link
Collaborator

dnakov commented Sep 12, 2024

@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch
All of these can just be served via a single litellm call like in ui/chat.py

I think @trufae said something about not wanting litellm implemented. Which is extra fine by me since I can transform the process function in a decorator and have it hold an argument to pre-process the response

@trufae We should resolve this as we're basically going to be replicating litellm. They've done all the model request/response parsing in there and consolidated all to the openai spec.

@nitanmarcel
Copy link
Contributor Author

@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch
All of these can just be served via a single litellm call like in ui/chat.py

I think @trufae said something about not wanting litellm implemented. Which is extra fine by me since I can transform the process function in a decorator and have it hold an argument to pre-process the response

@trufae We should resolve this as we're basically going to be replicating litellm. They've done all the model request/response parsing in there and consolidated all to the openai spec.

Took a deeper look at what instructor (a new library we use here) does under the hood to support other llm's tools and it has everything I need to easily implement the tools. Parsing the raw response needs to be done from scratch but I have the old code for that

https://github.com/jxnl/instructor/blob/959097e174a4cd57101503b433b0af8bcb39726d/instructor/function_calls.py

@dnakov
Copy link
Collaborator

dnakov commented Sep 13, 2024

Yeah, I've used instructor. But with litellm, you don't need to parse anything raw. I have no interest in maintaining transformations for so many models when it already exists, this isn't the primary point of this library.

@nitanmarcel
Copy link
Contributor Author

Yeah, I've used instructor. But with litellm, you don't need to parse anything raw. I have no interest in maintaining transformations for so many models when it already exists, this isn't the primary point of this library.

From what I've seen so far you still have to raw parse things, but there's the same response for all llms

@dnakov
Copy link
Collaborator

dnakov commented Sep 13, 2024

Yes, so you're only parsing 1 thing vs 1 for each API/model

@nitanmarcel
Copy link
Contributor Author

Yes, so you're only parsing 1 thing vs 1 for each API/model

We'll have to wait for an answer from @trufae.

@nitanmarcel
Copy link
Contributor Author

Yes, so you're only parsing 1 thing vs 1 for each API/model

Got the green. The other green not the high green

@dnakov
Copy link
Collaborator

dnakov commented Sep 13, 2024

Yes, so you're only parsing 1 thing vs 1 for each API/model

Got the green. The other green not the high green

What does that mean?

@nitanmarcel
Copy link
Contributor Author

Yes, so you're only parsing 1 thing vs 1 for each API/model

Got the green. The other green not the high green

What does that mean?

I can use llmlite :)

@nitanmarcel
Copy link
Contributor Author

Yes, so you're only parsing 1 thing vs 1 for each API/model

Got the green. The other green not the high green

What does that mean?

doesn''t work

AttributeError: 'Delta' object has no attribute 'role'

@nitanmarcel
Copy link
Contributor Author

ah the version was too old, but still:


ModelResponse(
│   id='chatcmpl-A730rEKe2r4dS5mn2BcCWYT9ILvQw',
│   choices=[
│   │   StreamingChoices(
│   │   │   finish_reason=None,
│   │   │   index=0,
│   │   │   delta=Delta(
│   │   │   │   refusal=None,
│   │   │   │   content=None,
│   │   │   │   role='assistant',
│   │   │   │   function_call=None,
│   │   │   │   tool_calls=[
│   │   │   │   │   ChatCompletionDeltaToolCall(
│   │   │   │   │   │   id=None,
│   │   │   │   │   │   function=Function(arguments='{\n', name=None),
│   │   │   │   │   │   type='function',
│   │   │   │   │   │   index=0
│   │   │   │   │   )
│   │   │   │   ]
│   │   │   ),
│   │   │   logprobs=None
│   │   )
│   ],
│   created=1726243242,
│   model='gpt-4',
│   object='chat.completion.chunk',
│   system_fingerprint=None
)```

@nitanmarcel
Copy link
Contributor Author

I do parse the funtions myself now so maybe this is my issue

@nitanmarcel
Copy link
Contributor Author

I do parse the funtions myself now so maybe this is my issue

Yep, I forgot how generators work 😅

@trufae
Copy link
Contributor

trufae commented Sep 13, 2024

my comments on litellm:

  • it feels so comercial by checking the website
  • im afraid of the amount of deps to make it too huge when packaged
  • we lose control on how we build the prompts
  • doesnt support llama afaik

so imho i would like to keep control on the llama side with chromadb and the prompt structure thing, at least as a separate codebase, even if its ugly i think this code gives us more control. let me know if i misunderstood anything from litellm.

How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?

@dnakov
Copy link
Collaborator

dnakov commented Sep 13, 2024

It is commercial, I think they got some VC funding, although it's at least MIT licensed.
It's not a crazy amount of deps, we'd want most of them anyway -- like the anthropic, openai and google libraries, pydantic, etc
It doesn't actually do anything to the prompts, you still have the same amount of control.
Yes, it doesn't do anything with running models locally, so doesn't help there. But if we expose a openai-compatible endpoint, we can still use the same completion code.

How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?

I'm thinking we can just use their convention <provider>/model_name for any API models so this way we don't have to constantly update model names. So -m openai/o1-preview would work even if it's not on the list.

@nitanmarcel
Copy link
Contributor Author

nitanmarcel commented Sep 14, 2024

my comments on litellm:

  • it feels so comercial by checking the website
  • im afraid of the amount of deps to make it too huge when packaged
  • we lose control on how we build the prompts
  • doesnt support llama afaik

so imho i would like to keep control on the llama side with chromadb and the prompt structure thing, at least as a separate codebase, even if its ugly i think this code gives us more control. let me know if i misunderstood anything from litellm.

How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?

IAs long as we don't use their UI or their moderation tools we are covered by the MIT license.

2, doesn't happen since we use the conversation wrapper, it returns the same format as OpenAI for all endpoints.

And we can use llama separately. About the size, it uses extras so in our case it only downloads the deps we need

Downloading litellm-1.45.0-py3-none-any.whl.metadata (32 kB)
Collecting aiohttp (from litellm)
  Downloading aiohttp-3.10.5.tar.gz (7.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.5/7.5 MB 4.3 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting click (from litellm)
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting importlib-metadata>=6.8.0 (from litellm)
  Downloading importlib_metadata-8.5.0-py3-none-any.whl.metadata (4.8 kB)
Collecting jinja2<4.0.0,>=3.1.2 (from litellm)
  Downloading jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting jsonschema<5.0.0,>=4.22.0 (from litellm)
  Downloading jsonschema-4.23.0-py3-none-any.whl.metadata (7.9 kB)
Collecting openai>=1.45.0 (from litellm)
  Downloading openai-1.45.0-py3-none-any.whl.metadata (22 kB)
Collecting pydantic<3.0.0,>=2.0.0 (from litellm)
  Downloading pydantic-2.9.1-py3-none-any.whl.metadata (146 kB)
Collecting python-dotenv>=0.2.0 (from litellm)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting requests<3.0.0,>=2.31.0 (from litellm)
  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting tiktoken>=0.7.0 (from litellm)
  Downloading tiktoken-0.7.0.tar.gz (33 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting tokenizers (from litellm)
  Downloading tokenizers-0.20.0.tar.gz (337 kB)

@nitanmarcel
Copy link
Contributor Author

nitanmarcel commented Sep 14, 2024

On the other side

It is commercial, I think they got some VC funding, although it's at least MIT licensed. It's not a crazy amount of deps, we'd want most of them anyway -- like the anthropic, openai and google libraries, pydantic, etc It doesn't actually do anything to the prompts, you still have the same amount of control. Yes, it doesn't do anything with running models locally, so doesn't help there. But if we expose a openai-compatible endpoint, we can still use the same completion code.

How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?

I'm thinking we can just use their convention <provider>/model_name for any API models so this way we don't have to constantly update model names. So -m openai/o1-preview would work even if it's not on the list.

I don't think we even need to do constantly update our models, at least in auto only the provider is set. About the /model it can be done too without LLM by creating our own wrapper around the endpoints. Maybe who knows, I come over an idea to make it easier to maintain

@nitanmarcel
Copy link
Contributor Author

nitanmarcel commented Sep 14, 2024

So, litellm or not, these can be done manually. + We can freely use parts of litellm in our code due to the dual license they use

@nitanmarcel
Copy link
Contributor Author

nitanmarcel commented Sep 14, 2024

@trufae @dnakov I've updated the task list with new tasks. Will support dnakov's suggestion to keep litellm while dropping the size of r2ai to ~200-ish mb from the ~500-is mb size which is now.

I hope everyone is happy ^^

@nitanmarcel
Copy link
Contributor Author

@dnakov @trufae any of you can test this? I'm afraid that my laptop isn't powerful enough and the only local model I was able to run didn't supported tools.

c1f0e2e

@trufae
Copy link
Contributor

trufae commented Sep 25, 2024

abandoned?

@nitanmarcel
Copy link
Contributor Author

abandoned?

Nope will come back soon to it. Just taking a break since handling the functionary models drove me nuts xD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants