Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs #6227

Merged
merged 12 commits into from
Jul 18, 2024

Conversation

njhill
Copy link
Member

@njhill njhill commented Jul 8, 2024

Currently the LoRA tokenizers aren't used in the OpenAI APIs, meaning the behaviour won't be correct if adapters are used that have custom added tokens. This PR includes changes to address that. It mostly replaces #3512.

More work is needed to address remaining inconsistencies in tokenization behaviour between the OpenAI front-end and standalone LLMEngine/AsyncLLMEngine use, including:

  • Standalone cases don't honor truncation and add_special_tokens request parameters
  • OpenAI API cases don't make use of TokenizerGroups for possible parallelization of tokenization

as well as some other inefficiencies.

But these are to be addressed in follow-on PRs.

Currently the LoRA tokenizers aren't used in the OpenAI APIs, meaning the behaviour won't be correct if adapters are used that have custom added tokens. This PR includes changes to address that. It mostly replaces vllm-project#3512.

More work is needed to address remaining inconsistencies in tokenization behaviour between the OpenAI front-end and standalone LLMEngine/AsyncLLMEngine use, including:
- Standalone cases don't honor truncation and add_special_tokens request parameters
- OpenAI API cases don't make use of TokenizerGroups for possible parallelization of tokenization

As well as some other inefficiencies.

But these are to be addressed in follow-on PRs.
@njhill njhill requested a review from DarkLight1337 July 8, 2024 22:42
@njhill
Copy link
Member Author

njhill commented Jul 8, 2024

@dtrifiro FYI

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments!

If possible, try to add a test case to verify whether the OpenAI-compatible server can apply the correct chat template.

tests/async_engine/test_chat_template.py Show resolved Hide resolved
vllm/entrypoints/openai/serving_chat.py Outdated Show resolved Hide resolved
vllm/entrypoints/openai/serving_chat.py Show resolved Hide resolved
vllm/transformers_utils/tokenizer.py Show resolved Hide resolved
@njhill
Copy link
Member Author

njhill commented Jul 9, 2024

If possible, try to add a test case to verify whether the OpenAI-compatible server can apply the correct chat template.

Thanks @DarkLight1337, yes test is forthcoming.

njhill added 2 commits July 10, 2024 11:21
…okenizer-take2

# Conflicts:
#	vllm/entrypoints/openai/serving_chat.py
#	vllm/entrypoints/openai/serving_completion.py
#	vllm/entrypoints/openai/serving_engine.py
@DarkLight1337
Copy link
Member

We can merge after the test case is added.

njhill added 5 commits July 16, 2024 12:21
…okenizer-take2

# Conflicts:
#	tests/async_engine/test_chat_template.py
#	tests/entrypoints/openai/test_completion.py
#	vllm/entrypoints/openai/serving_chat.py
#	vllm/entrypoints/openai/serving_completion.py
…okenizer-take2

# Conflicts:
#	tests/entrypoints/openai/test_chat.py
#	tests/entrypoints/openai/test_completion.py
#	tests/entrypoints/openai/test_tokenization.py
@njhill
Copy link
Member Author

njhill commented Jul 18, 2024

@DarkLight1337 tests now added! and there were a lot of merge conflicts to resolve 😅

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 18, 2024
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small question.

tests/entrypoints/openai/test_chat.py Show resolved Hide resolved
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) July 18, 2024 00:55
@DarkLight1337 DarkLight1337 disabled auto-merge July 18, 2024 02:59
@DarkLight1337
Copy link
Member

DarkLight1337 commented Jul 18, 2024

The error occurs because the tokenizer HuggingFace config is passed to the lru_cache from _image_token_str function, so we need to remove the lru_cache. To compensate, can a cached_property be added to CachedTokenizer so that we can avoid decoding the image token each time?

@njhill
Copy link
Member Author

njhill commented Jul 18, 2024

Thanks @DarkLight1337! I pushed a fix before I read your comment, hopefully this should be ok too.

@DarkLight1337 DarkLight1337 merged commit e2fbaee into vllm-project:main Jul 18, 2024
72 checks passed
@njhill njhill deleted the openai-tokenizer-take2 branch July 19, 2024 00:19
fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 19, 2024
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
gnpinkert pushed a commit to gnpinkert/vllm that referenced this pull request Jul 26, 2024
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Alvant <alvasian@yandex.ru>
KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants