-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenAI Tools Support/Function calling #154
Conversation
…upplying a quality schema if they want a particular format
prototype complete for non streaming generation |
…atcomprespchoice per chat_completion_chunk.py inside OAI lib.
…unction_calling_pr
Updated tool pydantic to match OAI Support for streaming Updated generate tool calls to use flag within chat_template and insert tool reminder
Updated fire func template
Switched from simply resuming generation with all previous tokens to leaning more on the chat template after a discussion with @bdashore3 The target implementation will now:
The system message does not stay in the message history, and is only used when resuming generation for creating the tool calls. This should help the model recall the correct args and tool names deep into the context window. To not use this approach the user would simply modify the chat template. I'm also aiming to release a generic Getting a good bit closer, but some implementation and then a lot of testing remain. |
…PI-function into function_calling_pr
Simplified tool_params Warning when gen_settings are being overidden becuase user set temp to 0 Corrected schema and tools to correct types for function args. Str for some reason
5efe618
to
59cc23d
Compare
…s are weird about newlines after headers, so this is an easier way to change globally)
…o write json in a string in the middle of the call.
Getting close here - A couple action items left on my list.
As always, open to thoughts and feedback from all. Should be ready for review soon. |
Good evening! This is ready for review. I suggest we discuss the documentation together since it’s on the wiki, rather than just in the README. Here are 4 scripts to help you evaluate the PR. First I'll quickly reiterate configuration of tabby for tool calling. Grab your favorite model, I prefer Llama 3.1 70B Instruct 7.0BPW. I've also included a couple other templates I made (for FireFunction's LLama3 model, and Groq's Llama3 tool calling fine tune) to demonstrate how easy the chat templates can be reconfigured based on what the underlying model needs. In this case, we would load this model with the
These eval assets can be found here: eval_assets_for_tabby_tool_calling.zip |
Signed-off-by: kingbri <bdashore3@proton.me>
Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The location of these variables may need to be changed in the future. Signed-off-by: kingbri <bdashore3@proton.me>
Since we're only looking for specific template variables that are static in the template, it makes more sense to render when the template is initialized. Signed-off-by: kingbri <bdashore3@proton.me>
Adhere to the format style of comments in the rest of the project. Signed-off-by: kingbri <bdashore3@proton.me>
Adding Function Calling to TabbyAPI
Currently TabbyAPI does not support function calling.
Function calling is great and has some really cool applications
This PR will build out the infrastructure to enable tool calling. This will be based off OpenAI's function calling. This implementation will be flexible to the different chat formats while the industry seeks a standard chat template (generally speaking and for function calling tasks).
In this PR I've built out a framework that can support many models (larger) that are sensitive enough to understand from context, like Llama 3.1 70B and others. This framework also supports custom finetuned models which may have specials tokens to support the functionality. I've included chat templates for Groq and FireFunction models in a comment below if you'd like to try them as well. These templates will be pushed separately to llm-prompt-templates. I have a couple quants for these 3 models here
Initially:
tool_choice
will always be consideredauto
parallel_tool_calls
will always be supported, but will fall on the model to actually write multiple calls in a single tool response. Most don't have an issue with this.Some implementations of tool calling models could require a specific format for specifying the function specs, we will leave the
functions
param on the chat completion endpoint loosely typed and exposed within the chat template for those situations.The proper
tools
, and thetool_calls
responses will be strongly typed to match OpenAI's implementation.Expansion to other models will be done by simply creating a new chat template.
To accomplish this, we need to ensure a few things:
chat template
via thetools
orfunctions
parameters in thechat_completion
endpoint.tool_call_start
str.tool_calls
param within thechoices
dict in the response.Where possible I will be following OpenAI's schema found here: OpenAPI schema for OAI