Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Tool Implementation improvements™️ #193

Closed
wants to merge 15 commits into from

Conversation

gittb
Copy link
Contributor

@gittb gittb commented Sep 8, 2024

After #154, I've continued to think about and evaluate TabbyAPI's tool calling integration. This is a draft PR as I continue to iterate and improve upon TabbyAPI's Tools implementation. Below are aspects that I notice and intend to address - in this PR.

Below are some observations and my planned actions.

Many models have a hard time writing the correct dtype (and sometimes the arg names) when calling functions.

  • We will attempt to build a custom tool response schema from the tool spec provided by the client. This mean we can not only ensure the format of the tool response is correct, but we can guarantee a couple things:

    1. args are spelled correctly
    2. required args are present
    3. dtypes for args are correct
  • Status: In PR ✅ - Working MVP

There are some model providers that have worked hard to adopt a tool call/response schema that closely resembles OpenAI's spec. I appreciate you! These models will naturally have better support within the current tools implementation within TabbyAPI, but may need slight tweaks to the default tool calling prompt template.

  • After discussions with @bdashore3 I feel that it's best to move tabby's default tool calling template to support Hermes, rather than generically accommodating Llama 3.1's chat template. While tool calling with Llama 3.1 models works well in TabbyAPI, I think Hermes 3's finetuned models better exhibit how tabby can deliver tool calling to it's users. I intend to replace the default tool template with a Hermes 3 steered template (this should also continue to work quite well for Llama 3.1, as it has no true specific tool calling token training) This will only require slight changes.

  • Status : Written in the below Just-in-Time system prompt format. Has not been added to PR just yet.

Tool calling models are strongly coupled to their system prompts

If you've read though the docs and the default tool calling template; you will see how we inform the model about the tools available to it, and further into the conversation, remind the model of it's tools/response schema after it indicates it wants to make a tool call via it's tool_start token. There isn't any magic solution for this, certainly models trained to tool call more effectively use tools deeper into the context window. But how can we help on the inference side? I'd like to experiment with some just-in-time system prompt based chat templates - these will more closely follow what Mistral does with their V3 template, but in ChatML. This is currently just an experiment with promising early results. This change would fall into the "unknown" section for a tools implementation, since we do not know how tools are conveyed to OpenAI models (or reminded). But the same currently applies for the tool reminder implementation in the default tool chat template that currently exists. I believe this should be considered.

  • Status: Experimenting - this template is not in the PR currently. I can add if folks would like.

Translation layer between Model <--> TabbyAPI for tools™

As I mention above, certain models more closely follow OpenAI's tool call/response schema. We don't know if the actual model is following this schema, we only know that the schema is used when the user communicates with the inference engine. Many models, do not follow this structure when making tool calls. This is not to say that model is bad at making tool calls, but that the responsibly falls to TabbyAPI to correctly translate between model tool speak, and the agreed OpenAI tool speak with the user. Currently, we do not have a graceful way of accommodating a different structure for the model. In order to accommodate models from C4AI and others, we must build this translation layer until the community finds consensus amongst model tool formatting.

  • Status: Planning

Notes on the code changes

  • Tool related functions have now been moved into tools.py

@gittb gittb marked this pull request as draft September 8, 2024 01:02
@gittb gittb closed this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant