vLLM backend uses two different tokenizers

## vLLM Backend: Tokenizer Configuration Mismatch

### Description

The vLLM backend has two separate tokenizers that can become misaligned when tokenizer-specific configuration is needed. This causes issues with features like tool calling when model-specific tokenizer modes are required.

Originally noticed with the vllm test which uses mistral (this test does give a warning about not using the mistral tokenizer)

### Details

The backend initializes two tokenizers:

1. **vLLM's internal tokenizer** - created with all engine arguments including `tokenizer_mode`, `tokenizer_revision` etc in `AsyncLLMEngine.from_engine_args()`. This is used to tokenize before inference, handling output, managing special tokens
2. **Backend's separate tokenizer** - created independently via `AutoTokenizer.from_pretrained()` without any configuration. This only gets the modelid and is used for formatting prompts via `apply_chat_template()` and deals with creating messages, adding tool definitions etc

### Impact

If we pass tokenizer configuration (e.g., `tokenizer_mode="mistral"`) to improve tool calling reliability, only vLLM's internal tokenizer receives it. The backend's tokenizer uses default settings... so we get a potential mismatch which can cause issues with special tokens and tool calling tokens ie with mistral. (and potentially make things even worse than not using a special tokenizer)

### Fix (?)

I think we could use the vLLM tokenizer for both formatting and generation to be consistent throughout.

### References

Found when working on #416 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM backend uses two different tokenizers #431

vLLM Backend: Tokenizer Configuration Mismatch

Description

Details

Impact

Fix (?)

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vLLM backend uses two different tokenizers #431

Description

vLLM Backend: Tokenizer Configuration Mismatch

Description

Details

Impact

Fix (?)

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions