Add bot_token
attribute to PreTrainedTokenizer
and PreTrainedTokenizerFast
#31709
Labels
Feature request
Request for a new feature
Feature request
I'm requesting for the attribute
bot_token
(beginning-of-tools token) to be added to thePreTrainedTokenizer
classes, similar toeos_token
. This token would be associated withself.bot_token
andself.bot_token_id
and would expose the token to downstream consumers like vLLM.Motivation
This request builds off this PR comment as well as the ongoing work to support function calling in transformers.
A number of downstream consumers depend on what's available in the
PreTrainedTokenizer
classes, like vLLM'sSequence
class andLLMEngine
class example. For example, the current problem I'm facing is that vLLM doesn't correctly label the finish reason for "tool call" outputs, as, well, tool calls, since theCompletionOutput.finish_reason
ultimately relies on the attributes available inPreTrainedTokenizer
.As open-source tool calling proliferates, having these attributes exposed would greatly enhance the utility of the library. This token can be set to
None
by default and should be backwards compatible with the right implementation.Your contribution
I can help contribute to the PR and write code. Might need help navigating the library + writing good test cases.
The text was updated successfully, but these errors were encountered: