OpenAI Tools Support/Function calling #154

gittb · 2024-07-18T23:33:00Z

Adding Function Calling to TabbyAPI

Currently TabbyAPI does not support function calling.

Function calling is great and has some really cool applications

This PR will build out the infrastructure to enable tool calling. This will be based off OpenAI's function calling. This implementation will be flexible to the different chat formats while the industry seeks a standard chat template (generally speaking and for function calling tasks).

In this PR I've built out a framework that can support many models (larger) that are sensitive enough to understand from context, like Llama 3.1 70B and others. This framework also supports custom finetuned models which may have specials tokens to support the functionality. I've included chat templates for Groq and FireFunction models in a comment below if you'd like to try them as well. These templates will be pushed separately to llm-prompt-templates. I have a couple quants for these 3 models here

Initially:

tool_choice will always be considered auto
parallel_tool_calls will always be supported, but will fall on the model to actually write multiple calls in a single tool response. Most don't have an issue with this.

Some implementations of tool calling models could require a specific format for specifying the function specs, we will leave the functions param on the chat completion endpoint loosely typed and exposed within the chat template for those situations.

The proper tools, and the tool_calls responses will be strongly typed to match OpenAI's implementation.

Expansion to other models will be done by simply creating a new chat template.

To accomplish this, we need to ensure a few things:

The user can pass in a schema of available functions to the chat template via the tools or functions parameters in the chat_completion endpoint.
The engine stops generation when the start of a tool call is detected via it's tool_call_start str.
The engine is reconfigured to follow a json schema needed to match OpenAI's spec then generation is resumed, ensuring the generated tool calls are formatted correctly.
TabbyAPI returns this tool call to the user via the tool_calls param within the choices dict in the response.

Where possible I will be following OpenAI's schema found here: OpenAPI schema for OAI

…o the stop list

…upplying a quality schema if they want a particular format

gittb · 2024-07-19T03:51:08Z

prototype complete for non streaming generation

…for fire func

…atcomprespchoice per chat_completion_chunk.py inside OAI lib.

…unction_calling_pr

Updated tool pydantic to match OAI Support for streaming Updated generate tool calls to use flag within chat_template and insert tool reminder

Updated fire func template

gittb · 2024-08-02T03:17:23Z

Switched from simply resuming generation with all previous tokens to leaning more on the chat template after a discussion with @bdashore3

The target implementation will now:

listens for the model to start a tool call
stops generation
insert a system message prior to where the model was starting the tool call with a reminder for the tools available, and the correct format
resumes generation with a json schema matching the expected output for an OAI tool call
extracts this generation out, and returns it as a tool_call object as OAI expects.

The system message does not stay in the message history, and is only used when resuming generation for creating the tool calls. This should help the model recall the correct args and tool names deep into the context window. To not use this approach the user would simply modify the chat template.

I'm also aiming to release a generic chatml tool calling template that can be used with models that did not specifically learn a tool call token during their training/fine-tuning.

Getting a good bit closer, but some implementation and then a lot of testing remain.

…PI-function into function_calling_pr

Simplified tool_params Warning when gen_settings are being overidden becuase user set temp to 0 Corrected schema and tools to correct types for function args. Str for some reason

…s are weird about newlines after headers, so this is an easier way to change globally)

…o write json in a string in the middle of the call.

gittb · 2024-08-11T21:17:55Z

Getting close here - A couple action items left on my list.

Building a minimal test with langchain to demonstrate integration by simply rewriting the base_url. ✅
Building a quick test for performance when multiple tool calls are done in one conversation, and/or tool calls are made deeper into the context window ✅
Reviewing all the code holistically for format/readability/maintainability ✅
bundle up some testing scripts for y'all to test this PR. ✅
write docs - decided to wait on this, as docs are in the wiki.

As always, open to thoughts and feedback from all. Should be ready for review soon.

…ription

…stead

gittb · 2024-08-16T23:51:12Z

Good evening!

This is ready for review. I suggest we discuss the documentation together since it’s on the wiki, rather than just in the README.

Here are 4 scripts to help you evaluate the PR. First I'll quickly reiterate configuration of tabby for tool calling.

Grab your favorite model, I prefer Llama 3.1 70B Instruct 7.0BPW. I've also included a couple other templates I made (for FireFunction's LLama3 model, and Groq's Llama3 tool calling fine tune) to demonstrate how easy the chat templates can be reconfigured based on what the underlying model needs.

In this case, we would load this model with the chatml_with_headers_tool_calling.jinja chat template, which should generally be compatible with other chatml models as well. Try not to scorch it with temp. We need a stable output.

tool_call_simple_test_oai.py: Plug in an OpenAI key here. This script will simply print out what an official tool call response looks like from OpenAI, allowing you to compare it with Tabby’s output.
tool_call_simple_test_nonstream.py: This script demonstrates non-streaming tool calling in the same way as above. Substitute your Tabby API key and endpoint, and compare the output with the OpenAI tool call.
tool_call_simple_test_stream.py: This script shows that tool calling can also be done while streaming.
minimal_langchain_example.py: This example demonstrates using the OpenAI module in LangChain with a custom function. It rewrites the endpoint to Tabby, showing that it satisfies a third party’s OpenAI implementation.

These eval assets can be found here: eval_assets_for_tabby_tool_calling.zip

Signed-off-by: kingbri <bdashore3@proton.me>

Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The location of these variables may need to be changed in the future. Signed-off-by: kingbri <bdashore3@proton.me>

Since we're only looking for specific template variables that are static in the template, it makes more sense to render when the template is initialized. Signed-off-by: kingbri <bdashore3@proton.me>

Adhere to the format style of comments in the rest of the project. Signed-off-by: kingbri <bdashore3@proton.me>

gittb added 5 commits July 18, 2024 22:35

returning stop str if exists from gen

79fbcb9

added chat template for firefunctionv2

e55be38

pulling tool vars from template

e2db3df

adding parsing for tool inputs/outputs

ce66298

passing tool data from endpoint to chat template, adding tool_start t…

7720844

…o the stop list

gittb marked this pull request as draft July 18, 2024 23:33

gittb changed the title ~~OpenAI Tools Support/Function calling~~ [WIP] OpenAI Tools Support/Function calling Jul 18, 2024

gittb added 2 commits July 19, 2024 03:48

loosened typing on the response tool call, leaning more on the user s…

16f7cf5

…upplying a quality schema if they want a particular format

non streaming generation prototype

6ee4066

gittb added 4 commits July 19, 2024 04:05

cleaning template

c76397e

Continued work with type, ingestion into template, and chat template …

55c1eeb

…for fire func

Correction - streaming toolcall comes back as delta obj not inside ch…

e99e3ba

…atcomprespchoice per chat_completion_chunk.py inside OAI lib.

Merge branch 'main' of https://github.com/theroyallab/tabbyAPI into f…

24247be

…unction_calling_pr

turboderp mentioned this pull request Jul 31, 2024

A doubt regarding filters/tools. turboderp-org/exllamav2#581

Closed

gittb added 3 commits August 2, 2024 02:52

Ruff Formating

a4b03bb

Moved stop string and tool updates out of prompt creation func

296d9c4

Updated tool pydantic to match OAI Support for streaming Updated generate tool calls to use flag within chat_template and insert tool reminder

Llama 3.1 chat templates

26526a0

Updated fire func template

gittb added 2 commits August 2, 2024 14:57

renamed llama3.1 to chatml_with_headers..

494b080

Merge branch 'theroyallab:main' into function_calling_pr

2cfe8e2

gittb mentioned this pull request Aug 2, 2024

Unknown LMFormatEnforcer Problem & AttributeError with "additionalProperties": true noamgat/lm-format-enforcer#129

Closed

gittb added 4 commits August 3, 2024 20:18

Merge branch 'theroyallab:main' into function_calling_pr

2a5d7f1

update name of template

3458119

Merge branch 'function_calling_pr' of https://github.com/gittb/tabbyA…

4b310df

…PI-function into function_calling_pr

Support for calling a tool start token rather than the string.

59cc23d

Simplified tool_params Warning when gen_settings are being overidden becuase user set temp to 0 Corrected schema and tools to correct types for function args. Str for some reason

gittb force-pushed the function_calling_pr branch from 5efe618 to 59cc23d Compare August 6, 2024 02:05

gittb added 3 commits August 6, 2024 02:07

draft groq tool use model template

25c71be

changed headers to vars for readablity (but mostly because some model…

c108063

…s are weird about newlines after headers, so this is an easier way to change globally)

Clean up comments and code in chat comp

fa04357

gittb added 5 commits August 10, 2024 18:57

Post processed tool call to meet OAI spec rather than forcing model t…

7347d92

…o write json in a string in the middle of the call.

changes example back to args as json rather than string of json

0017f4f

Standardize chat templates to each other

5fb9a3b

cleaning/rewording

383e888

stop elements can also be ints (tokens)

d6c21ef

gittb added 5 commits August 11, 2024 23:36

Cleaning/formatting

f08e41c

added special tokens for tools and tool_response as specified in desc…

54d6ef8

…ription

Merge branch 'theroyallab:main' into function_calling_pr

a615cba

Cleaning

032b3ca

removing aux templates - going to live in llm-promp-templates repo in…

8c31667

…stead

gittb marked this pull request as ready for review August 16, 2024 23:52

bdashore3 added 4 commits August 16, 2024 20:52

Tree: Format

d17fe1b

Signed-off-by: kingbri <bdashore3@proton.me>

Chat Completions: Don't include internal tool variables in OpenAPI

f5a11e5

Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The location of these variables may need to be changed in the future. Signed-off-by: kingbri <bdashore3@proton.me>

Templates: Deserialize metadata on template load

0d389ed

Since we're only looking for specific template variables that are static in the template, it makes more sense to render when the template is initialized. Signed-off-by: kingbri <bdashore3@proton.me>

Tools: Fix comments

858ad49

Adhere to the format style of comments in the rest of the project. Signed-off-by: kingbri <bdashore3@proton.me>

bdashore3 merged commit 70b9fc9 into theroyallab:main Aug 17, 2024
1 check passed

Orion-zhen mentioned this pull request Aug 17, 2024

[BUG] Failed to import chat template fron config.json #167

Closed

4 tasks

gittb deleted the function_calling_pr branch August 17, 2024 23:21

gittb mentioned this pull request Aug 23, 2024

Tool Calling: Example templates theroyallab/llm-prompt-templates#7

Merged

gittb mentioned this pull request Sep 8, 2024

WIP - Tool Implementation improvements™️ #193

Closed

gittb changed the title ~~[WIP] OpenAI Tools Support/Function calling~~ OpenAI Tools Support/Function calling Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI Tools Support/Function calling #154

OpenAI Tools Support/Function calling #154

gittb commented Jul 18, 2024 •

edited

Loading

gittb commented Jul 19, 2024

gittb commented Aug 2, 2024 •

edited

Loading

gittb commented Aug 11, 2024 •

edited

Loading

gittb commented Aug 16, 2024 •

edited

Loading

OpenAI Tools Support/Function calling #154

OpenAI Tools Support/Function calling #154

Conversation

gittb commented Jul 18, 2024 • edited Loading

Adding Function Calling to TabbyAPI

Currently TabbyAPI does not support function calling.

gittb commented Jul 19, 2024

gittb commented Aug 2, 2024 • edited Loading

gittb commented Aug 11, 2024 • edited Loading

gittb commented Aug 16, 2024 • edited Loading

gittb commented Jul 18, 2024 •

edited

Loading

gittb commented Aug 2, 2024 •

edited

Loading

gittb commented Aug 11, 2024 •

edited

Loading

gittb commented Aug 16, 2024 •

edited

Loading