Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI Tools Support/Function calling #154

Merged
merged 37 commits into from
Aug 17, 2024

Conversation

gittb
Copy link
Contributor

@gittb gittb commented Jul 18, 2024

Adding Function Calling to TabbyAPI

Currently TabbyAPI does not support function calling.

Function calling is great and has some really cool applications

This PR will build out the infrastructure to enable tool calling. This will be based off OpenAI's function calling. This implementation will be flexible to the different chat formats while the industry seeks a standard chat template (generally speaking and for function calling tasks).

In this PR I've built out a framework that can support many models (larger) that are sensitive enough to understand from context, like Llama 3.1 70B and others. This framework also supports custom finetuned models which may have specials tokens to support the functionality. I've included chat templates for Groq and FireFunction models in a comment below if you'd like to try them as well. These templates will be pushed separately to llm-prompt-templates. I have a couple quants for these 3 models here

Initially:

  • tool_choice will always be considered auto
  • parallel_tool_calls will always be supported, but will fall on the model to actually write multiple calls in a single tool response. Most don't have an issue with this.

Some implementations of tool calling models could require a specific format for specifying the function specs, we will leave the functions param on the chat completion endpoint loosely typed and exposed within the chat template for those situations.

The proper tools, and the tool_calls responses will be strongly typed to match OpenAI's implementation.

Expansion to other models will be done by simply creating a new chat template.

To accomplish this, we need to ensure a few things:

  1. The user can pass in a schema of available functions to the chat template via the tools or functions parameters in the chat_completion endpoint.
  2. The engine stops generation when the start of a tool call is detected via it's tool_call_start str.
  3. The engine is reconfigured to follow a json schema needed to match OpenAI's spec then generation is resumed, ensuring the generated tool calls are formatted correctly.
  4. TabbyAPI returns this tool call to the user via the tool_calls param within the choices dict in the response.

Where possible I will be following OpenAI's schema found here: OpenAPI schema for OAI

@gittb gittb marked this pull request as draft July 18, 2024 23:33
@gittb gittb changed the title OpenAI Tools Support/Function calling [WIP] OpenAI Tools Support/Function calling Jul 18, 2024
@gittb
Copy link
Contributor Author

gittb commented Jul 19, 2024

prototype complete for non streaming generation

gittb added 3 commits August 2, 2024 02:52
Updated tool pydantic to match OAI

Support for streaming

Updated generate tool calls to use flag within chat_template and insert tool reminder
Updated fire func template
@gittb
Copy link
Contributor Author

gittb commented Aug 2, 2024

Switched from simply resuming generation with all previous tokens to leaning more on the chat template after a discussion with @bdashore3

The target implementation will now:

  • listens for the model to start a tool call
  • stops generation
  • insert a system message prior to where the model was starting the tool call with a reminder for the tools available, and the correct format
  • resumes generation with a json schema matching the expected output for an OAI tool call
  • extracts this generation out, and returns it as a tool_call object as OAI expects.

The system message does not stay in the message history, and is only used when resuming generation for creating the tool calls. This should help the model recall the correct args and tool names deep into the context window. To not use this approach the user would simply modify the chat template.

I'm also aiming to release a generic chatml tool calling template that can be used with models that did not specifically learn a tool call token during their training/fine-tuning.

Getting a good bit closer, but some implementation and then a lot of testing remain.

gittb added 4 commits August 3, 2024 20:18
Simplified tool_params

Warning when gen_settings are being overidden becuase user set temp to 0

Corrected schema and tools to correct types for function args. Str for some reason
@gittb gittb force-pushed the function_calling_pr branch from 5efe618 to 59cc23d Compare August 6, 2024 02:05
@gittb
Copy link
Contributor Author

gittb commented Aug 11, 2024

Getting close here - A couple action items left on my list.

  • Building a minimal test with langchain to demonstrate integration by simply rewriting the base_url. ✅
  • Building a quick test for performance when multiple tool calls are done in one conversation, and/or tool calls are made deeper into the context window ✅
  • Reviewing all the code holistically for format/readability/maintainability ✅
  • bundle up some testing scripts for y'all to test this PR. ✅
  • write docs - decided to wait on this, as docs are in the wiki.

As always, open to thoughts and feedback from all. Should be ready for review soon.

@gittb
Copy link
Contributor Author

gittb commented Aug 16, 2024

Good evening!

This is ready for review. I suggest we discuss the documentation together since it’s on the wiki, rather than just in the README.

Here are 4 scripts to help you evaluate the PR. First I'll quickly reiterate configuration of tabby for tool calling.

Grab your favorite model, I prefer Llama 3.1 70B Instruct 7.0BPW. I've also included a couple other templates I made (for FireFunction's LLama3 model, and Groq's Llama3 tool calling fine tune) to demonstrate how easy the chat templates can be reconfigured based on what the underlying model needs.

In this case, we would load this model with the chatml_with_headers_tool_calling.jinja chat template, which should generally be compatible with other chatml models as well. Try not to scorch it with temp. We need a stable output.

  • tool_call_simple_test_oai.py: Plug in an OpenAI key here. This script will simply print out what an official tool call response looks like from OpenAI, allowing you to compare it with Tabby’s output.

  • tool_call_simple_test_nonstream.py: This script demonstrates non-streaming tool calling in the same way as above. Substitute your Tabby API key and endpoint, and compare the output with the OpenAI tool call.

  • tool_call_simple_test_stream.py: This script shows that tool calling can also be done while streaming.

  • minimal_langchain_example.py: This example demonstrates using the OpenAI module in LangChain with a custom function. It rewrites the endpoint to Tabby, showing that it satisfies a third party’s OpenAI implementation.

These eval assets can be found here: eval_assets_for_tabby_tool_calling.zip

@gittb gittb marked this pull request as ready for review August 16, 2024 23:52
Signed-off-by: kingbri <bdashore3@proton.me>
Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The
location of these variables may need to be changed in the future.

Signed-off-by: kingbri <bdashore3@proton.me>
Since we're only looking for specific template variables that are
static in the template, it makes more sense to render when the template
is initialized.

Signed-off-by: kingbri <bdashore3@proton.me>
Adhere to the format style of comments in the rest of the project.

Signed-off-by: kingbri <bdashore3@proton.me>
@bdashore3 bdashore3 merged commit 70b9fc9 into theroyallab:main Aug 17, 2024
1 check passed
@gittb gittb deleted the function_calling_pr branch August 17, 2024 23:21
@gittb gittb changed the title [WIP] OpenAI Tools Support/Function calling OpenAI Tools Support/Function calling Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants