WIP: Auto mode rework #48

nitanmarcel · 2024-09-12T18:25:42Z

Checklist

~~- [x] Rewrite most of the code~~
~~- [X] Re-Implement OpenAI~~
~~- [ ] Re-implement Anthropic~~
~~- [ ] Re-Implement Llama~~
~~- [ ] Re-Implement Bedrock~~
~~- [ ] Re-Implement Groq~~
~~- [ ] Re-Implement Google~~
~~- [ ] Re-Implement NousResearch~~
~~- [ ] Implement chromadb (Add radare documentation as knowledge base, promt the ai to return "memories" that can be saved and used later.~~

Implement litellm
Implement llama_cpp
Drop chroma db which is 200 MB in size with sqlite which 17KB in size
Chunk Messages to avoid reaching token limit
Support more function calling models #41
Optimizations
Add documentation on adding custom functions
More functions? Like internet + now we could support external functions out of the box (of course,we still have to implement the api.
And maybe more?

nitanmarcel · 2024-09-12T18:28:38Z

@dnakov :)

trufae · 2024-09-12T20:56:17Z

Omg this pr couldnt be bigger

nitanmarcel · 2024-09-12T21:05:19Z

Omg this pr couldnt be bigger

hihi. I'm not done

nitanmarcel · 2024-09-12T21:07:16Z

Can someone tell me what's a token limit? Because I don't get any =)))

nitanmarcel · 2024-09-12T21:08:16Z

Oh, this one -_-. I was close enough tho

openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, you requested 8226 tokens (3983 in the messages, 147 in the functions, and 4096 in the
completion). Please reduce the length of the messages, functions, or completion.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}```

dnakov · 2024-09-12T21:37:37Z

@nitanmarcel before you get too far..
Re-Implement OpenAI
Re-implement Anthropic
Re-Implement Llama
Re-Implement Bedrock
Re-Implement Groq
Re-Implement Google
Re-Implement NousResearch

All of these can just be served via a single litellm call like in ui/chat.py

nitanmarcel · 2024-09-12T21:42:25Z

@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch

All of these can just be served via a single litellm call like in ui/chat.py

I think @trufae said something about not wanting litellm implemented. Which is extra fine by me since I can transform the process function in a decorator and have it hold an argument to pre-process the response

nitanmarcel · 2024-09-12T21:43:50Z

Would be something like:

@process_response(processor=function_to_convert_response)
def unsupported_model_call(...):```

nitanmarcel · 2024-09-12T21:45:29Z

Would be something like:

@process_response(processor=function_to_convert_response)
def unsupported_model_call(...):```

Tho it implies that it supports the same tool format. Can create a pre_processor argument for that too

nitanmarcel · 2024-09-12T21:47:31Z

Can someone tell me what's a token limit? Because I don't get any =)))

Anyway, I have this to figure out. The chunking of big resilts works pretty fine almost

dnakov · 2024-09-12T21:48:30Z

@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch
All of these can just be served via a single litellm call like in ui/chat.py

I think @trufae said something about not wanting litellm implemented. Which is extra fine by me since I can transform the process function in a decorator and have it hold an argument to pre-process the response

@trufae We should resolve this as we're basically going to be replicating litellm. They've done all the model request/response parsing in there and consolidated all to the openai spec.

nitanmarcel · 2024-09-13T11:45:57Z

@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch
All of these can just be served via a single litellm call like in ui/chat.py

I think @trufae said something about not wanting litellm implemented. Which is extra fine by me since I can transform the process function in a decorator and have it hold an argument to pre-process the response

@trufae We should resolve this as we're basically going to be replicating litellm. They've done all the model request/response parsing in there and consolidated all to the openai spec.

Took a deeper look at what instructor (a new library we use here) does under the hood to support other llm's tools and it has everything I need to easily implement the tools. Parsing the raw response needs to be done from scratch but I have the old code for that

https://github.com/jxnl/instructor/blob/959097e174a4cd57101503b433b0af8bcb39726d/instructor/function_calls.py

dnakov · 2024-09-13T11:53:35Z

Yeah, I've used instructor. But with litellm, you don't need to parse anything raw. I have no interest in maintaining transformations for so many models when it already exists, this isn't the primary point of this library.

nitanmarcel · 2024-09-13T12:01:28Z

Yeah, I've used instructor. But with litellm, you don't need to parse anything raw. I have no interest in maintaining transformations for so many models when it already exists, this isn't the primary point of this library.

From what I've seen so far you still have to raw parse things, but there's the same response for all llms

dnakov · 2024-09-13T12:43:13Z

Yes, so you're only parsing 1 thing vs 1 for each API/model

nitanmarcel · 2024-09-13T12:50:12Z

Yes, so you're only parsing 1 thing vs 1 for each API/model

We'll have to wait for an answer from @trufae.

nitanmarcel · 2024-09-13T14:30:27Z

Yes, so you're only parsing 1 thing vs 1 for each API/model

Got the green. The other green not the high green

dnakov · 2024-09-13T14:54:24Z

Yes, so you're only parsing 1 thing vs 1 for each API/model

Got the green. The other green not the high green

What does that mean?

nitanmarcel · 2024-09-13T15:18:49Z

Yes, so you're only parsing 1 thing vs 1 for each API/model

Got the green. The other green not the high green

What does that mean?

I can use llmlite :)

nitanmarcel · 2024-09-13T15:48:34Z

Yes, so you're only parsing 1 thing vs 1 for each API/model

Got the green. The other green not the high green

What does that mean?

doesn''t work

AttributeError: 'Delta' object has no attribute 'role'

nitanmarcel · 2024-09-13T16:01:28Z

ah the version was too old, but still:


ModelResponse(
│   id='chatcmpl-A730rEKe2r4dS5mn2BcCWYT9ILvQw',
│   choices=[
│   │   StreamingChoices(
│   │   │   finish_reason=None,
│   │   │   index=0,
│   │   │   delta=Delta(
│   │   │   │   refusal=None,
│   │   │   │   content=None,
│   │   │   │   role='assistant',
│   │   │   │   function_call=None,
│   │   │   │   tool_calls=[
│   │   │   │   │   ChatCompletionDeltaToolCall(
│   │   │   │   │   │   id=None,
│   │   │   │   │   │   function=Function(arguments='{\n', name=None),
│   │   │   │   │   │   type='function',
│   │   │   │   │   │   index=0
│   │   │   │   │   )
│   │   │   │   ]
│   │   │   ),
│   │   │   logprobs=None
│   │   )
│   ],
│   created=1726243242,
│   model='gpt-4',
│   object='chat.completion.chunk',
│   system_fingerprint=None
)```

nitanmarcel · 2024-09-13T16:06:36Z

I do parse the funtions myself now so maybe this is my issue

nitanmarcel · 2024-09-13T17:35:13Z

I do parse the funtions myself now so maybe this is my issue

Yep, I forgot how generators work 😅

trufae · 2024-09-13T23:04:20Z

my comments on litellm:

it feels so comercial by checking the website
im afraid of the amount of deps to make it too huge when packaged
we lose control on how we build the prompts
doesnt support llama afaik

so imho i would like to keep control on the llama side with chromadb and the prompt structure thing, at least as a separate codebase, even if its ugly i think this code gives us more control. let me know if i misunderstood anything from litellm.

How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?

dnakov · 2024-09-13T23:33:42Z

It is commercial, I think they got some VC funding, although it's at least MIT licensed.
It's not a crazy amount of deps, we'd want most of them anyway -- like the anthropic, openai and google libraries, pydantic, etc
It doesn't actually do anything to the prompts, you still have the same amount of control.
Yes, it doesn't do anything with running models locally, so doesn't help there. But if we expose a openai-compatible endpoint, we can still use the same completion code.

How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?

I'm thinking we can just use their convention <provider>/model_name for any API models so this way we don't have to constantly update model names. So -m openai/o1-preview would work even if it's not on the list.

nitanmarcel · 2024-09-14T08:22:49Z

my comments on litellm:

it feels so comercial by checking the website

im afraid of the amount of deps to make it too huge when packaged

we lose control on how we build the prompts

doesnt support llama afaik

so imho i would like to keep control on the llama side with chromadb and the prompt structure thing, at least as a separate codebase, even if its ugly i think this code gives us more control. let me know if i misunderstood anything from litellm.

How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?

IAs long as we don't use their UI or their moderation tools we are covered by the MIT license.

2, doesn't happen since we use the conversation wrapper, it returns the same format as OpenAI for all endpoints.

And we can use llama separately. About the size, it uses extras so in our case it only downloads the deps we need

Downloading litellm-1.45.0-py3-none-any.whl.metadata (32 kB)
Collecting aiohttp (from litellm)
  Downloading aiohttp-3.10.5.tar.gz (7.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.5/7.5 MB 4.3 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting click (from litellm)
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting importlib-metadata>=6.8.0 (from litellm)
  Downloading importlib_metadata-8.5.0-py3-none-any.whl.metadata (4.8 kB)
Collecting jinja2<4.0.0,>=3.1.2 (from litellm)
  Downloading jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting jsonschema<5.0.0,>=4.22.0 (from litellm)
  Downloading jsonschema-4.23.0-py3-none-any.whl.metadata (7.9 kB)
Collecting openai>=1.45.0 (from litellm)
  Downloading openai-1.45.0-py3-none-any.whl.metadata (22 kB)
Collecting pydantic<3.0.0,>=2.0.0 (from litellm)
  Downloading pydantic-2.9.1-py3-none-any.whl.metadata (146 kB)
Collecting python-dotenv>=0.2.0 (from litellm)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting requests<3.0.0,>=2.31.0 (from litellm)
  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting tiktoken>=0.7.0 (from litellm)
  Downloading tiktoken-0.7.0.tar.gz (33 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting tokenizers (from litellm)
  Downloading tokenizers-0.20.0.tar.gz (337 kB)

nitanmarcel · 2024-09-14T08:50:46Z

On the other side

It is commercial, I think they got some VC funding, although it's at least MIT licensed. It's not a crazy amount of deps, we'd want most of them anyway -- like the anthropic, openai and google libraries, pydantic, etc It doesn't actually do anything to the prompts, you still have the same amount of control. Yes, it doesn't do anything with running models locally, so doesn't help there. But if we expose a openai-compatible endpoint, we can still use the same completion code.

How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?

I'm thinking we can just use their convention <provider>/model_name for any API models so this way we don't have to constantly update model names. So -m openai/o1-preview would work even if it's not on the list.

I don't think we even need to do constantly update our models, at least in auto only the provider is set. About the /model it can be done too without LLM by creating our own wrapper around the endpoints. Maybe who knows, I come over an idea to make it easier to maintain

nitanmarcel · 2024-09-14T08:51:13Z

So, litellm or not, these can be done manually. + We can freely use parts of litellm in our code due to the dual license they use

nitanmarcel · 2024-09-14T13:01:23Z

@trufae @dnakov I've updated the task list with new tasks. Will support dnakov's suggestion to keep litellm while dropping the size of r2ai to ~200-ish mb from the ~500-is mb size which is now.

I hope everyone is happy ^^

nitanmarcel · 2024-09-14T19:03:52Z

@dnakov @trufae any of you can test this? I'm afraid that my laptop isn't powerful enough and the only local model I was able to run didn't supported tools.

c1f0e2e

It's never used except in a function which was never used either (in the old code)

…hoice

This was the only thing we used from it anyway

Might break openai? Can't test it right now, but I hope I can make both happy with a unified sollution instead of patching everything for each model

trufae · 2024-09-25T08:59:57Z

abandoned?

nitanmarcel · 2024-09-25T09:01:41Z

abandoned?

Nope will come back soon to it. Just taking a break since handling the functionary models drove me nuts xD

nitanmarcel force-pushed the r2clippy branch from 44b8109 to 82d5eb9 Compare September 14, 2024 11:20

nitanmarcel force-pushed the r2clippy branch from cea4fd0 to c1f0e2e Compare September 14, 2024 19:02

nitanmarcel force-pushed the r2clippy branch from 7b871a3 to d566c74 Compare September 14, 2024 20:56

nitanmarcel added 14 commits September 16, 2024 18:34

WIP: Auto mode rework

d11139e

Avoid using dot imports

f1af862

Get rid of redundant transformers library

0d9a021

It's never used except in a function which was never used either (in the old code)

Add litellm and llama-cpp (with openai style response now)

826ea9a

Add openapi to longchain providers

2e5d457

Fix some openai issues introduces in litellm commit

db24771

Add llm.chat_format environment

43344bb

TEST: Add memory

93b65a9

Disable streaming for llm-cpp as it's not supported with auto tools c…

2968667

…hoice

Drop instructor in favour of using our own OpenAISchema model

3d176c6

This was the only thing we used from it anyway

Add env to change tool choice to support llm-cpp streaming

1a0eb41

Add deps-cuda to build llama-cpp with cuda support

04c513b

Modify OpenAISchema to reasamble old format

6dc3fc5

Disable chunking (temporary), fix llama-cpp tools

7dbb064

Might break openai? Can't test it right now, but I hope I can make both happy with a unified sollution instead of patching everything for each model

nitanmarcel force-pushed the r2clippy branch from 2182da5 to 7dbb064 Compare September 16, 2024 15:34

Fix cuda deps installation

9a91635

nitanmarcel closed this Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Auto mode rework #48

WIP: Auto mode rework #48

nitanmarcel commented Sep 12, 2024 •

edited

Loading

nitanmarcel commented Sep 12, 2024

trufae commented Sep 12, 2024

nitanmarcel commented Sep 12, 2024

nitanmarcel commented Sep 12, 2024

nitanmarcel commented Sep 12, 2024

dnakov commented Sep 12, 2024

nitanmarcel commented Sep 12, 2024 •

edited

Loading

nitanmarcel commented Sep 12, 2024 •

edited

Loading

nitanmarcel commented Sep 12, 2024

nitanmarcel commented Sep 12, 2024 •

edited

Loading

dnakov commented Sep 12, 2024

nitanmarcel commented Sep 13, 2024

dnakov commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

dnakov commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

dnakov commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

trufae commented Sep 13, 2024

dnakov commented Sep 13, 2024 •

edited

Loading

nitanmarcel commented Sep 14, 2024 •

edited

Loading

nitanmarcel commented Sep 14, 2024 •

edited

Loading

nitanmarcel commented Sep 14, 2024 •

edited

Loading

nitanmarcel commented Sep 14, 2024 •

edited

Loading

nitanmarcel commented Sep 14, 2024

trufae commented Sep 25, 2024

nitanmarcel commented Sep 25, 2024

WIP: Auto mode rework #48

WIP: Auto mode rework #48

Conversation

nitanmarcel commented Sep 12, 2024 • edited Loading

nitanmarcel commented Sep 12, 2024

trufae commented Sep 12, 2024

nitanmarcel commented Sep 12, 2024

nitanmarcel commented Sep 12, 2024

nitanmarcel commented Sep 12, 2024

dnakov commented Sep 12, 2024

nitanmarcel commented Sep 12, 2024 • edited Loading

nitanmarcel commented Sep 12, 2024 • edited Loading

nitanmarcel commented Sep 12, 2024

nitanmarcel commented Sep 12, 2024 • edited Loading

dnakov commented Sep 12, 2024

nitanmarcel commented Sep 13, 2024

dnakov commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

dnakov commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

dnakov commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

nitanmarcel commented Sep 13, 2024

trufae commented Sep 13, 2024

dnakov commented Sep 13, 2024 • edited Loading

nitanmarcel commented Sep 14, 2024 • edited Loading

nitanmarcel commented Sep 14, 2024 • edited Loading

nitanmarcel commented Sep 14, 2024 • edited Loading

nitanmarcel commented Sep 14, 2024 • edited Loading

nitanmarcel commented Sep 14, 2024

trufae commented Sep 25, 2024

nitanmarcel commented Sep 25, 2024

nitanmarcel commented Sep 12, 2024 •

edited

Loading

nitanmarcel commented Sep 12, 2024 •

edited

Loading

nitanmarcel commented Sep 12, 2024 •

edited

Loading

nitanmarcel commented Sep 12, 2024 •

edited

Loading

dnakov commented Sep 13, 2024 •

edited

Loading

nitanmarcel commented Sep 14, 2024 •

edited

Loading

nitanmarcel commented Sep 14, 2024 •

edited

Loading

nitanmarcel commented Sep 14, 2024 •

edited

Loading

nitanmarcel commented Sep 14, 2024 •

edited

Loading