WIP - Tool Implementation improvements™️ #193
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After #154, I've continued to think about and evaluate TabbyAPI's tool calling integration. This is a draft PR as I continue to iterate and improve upon TabbyAPI's Tools implementation. Below are aspects that I notice and intend to address - in this PR.
Below are some observations and my planned actions.
Many models have a hard time writing the correct dtype (and sometimes the arg names) when calling functions.
We will attempt to build a custom tool response schema from the tool spec provided by the client. This mean we can not only ensure the format of the tool response is correct, but we can guarantee a couple things:
Status: In PR ✅ - Working MVP
There are some model providers that have worked hard to adopt a tool call/response schema that closely resembles OpenAI's spec. I appreciate you! These models will naturally have better support within the current tools implementation within TabbyAPI, but may need slight tweaks to the default tool calling prompt template.
After discussions with @bdashore3 I feel that it's best to move tabby's default tool calling template to support Hermes, rather than generically accommodating Llama 3.1's chat template. While tool calling with Llama 3.1 models works well in TabbyAPI, I think Hermes 3's finetuned models better exhibit how tabby can deliver tool calling to it's users. I intend to replace the default tool template with a Hermes 3 steered template (this should also continue to work quite well for Llama 3.1, as it has no true specific tool calling token training) This will only require slight changes.
Status : Written in the below Just-in-Time system prompt format. Has not been added to PR just yet.
Tool calling models are strongly coupled to their system prompts
If you've read though the docs and the default tool calling template; you will see how we inform the model about the tools available to it, and further into the conversation, remind the model of it's tools/response schema after it indicates it wants to make a tool call via it's tool_start token. There isn't any magic solution for this, certainly models trained to tool call more effectively use tools deeper into the context window. But how can we help on the inference side? I'd like to experiment with some just-in-time system prompt based chat templates - these will more closely follow what Mistral does with their V3 template, but in ChatML. This is currently just an experiment with promising early results. This change would fall into the "unknown" section for a tools implementation, since we do not know how tools are conveyed to OpenAI models (or reminded). But the same currently applies for the tool reminder implementation in the default tool chat template that currently exists. I believe this should be considered.
Translation layer between Model <--> TabbyAPI for tools™
As I mention above, certain models more closely follow OpenAI's tool call/response schema. We don't know if the actual model is following this schema, we only know that the schema is used when the user communicates with the inference engine. Many models, do not follow this structure when making tool calls. This is not to say that model is bad at making tool calls, but that the responsibly falls to TabbyAPI to correctly translate between model tool speak, and the agreed OpenAI tool speak with the user. Currently, we do not have a graceful way of accommodating a different structure for the model. In order to accommodate models from C4AI and others, we must build this translation layer until the community finds consensus amongst model tool formatting.
Notes on the code changes
tools.py