[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs #6227

njhill · 2024-07-08T22:42:02Z

Currently the LoRA tokenizers aren't used in the OpenAI APIs, meaning the behaviour won't be correct if adapters are used that have custom added tokens. This PR includes changes to address that. It mostly replaces #3512.

More work is needed to address remaining inconsistencies in tokenization behaviour between the OpenAI front-end and standalone LLMEngine/AsyncLLMEngine use, including:

Standalone cases don't honor truncation and add_special_tokens request parameters
OpenAI API cases don't make use of TokenizerGroups for possible parallelization of tokenization

as well as some other inefficiencies.

But these are to be addressed in follow-on PRs.

Currently the LoRA tokenizers aren't used in the OpenAI APIs, meaning the behaviour won't be correct if adapters are used that have custom added tokens. This PR includes changes to address that. It mostly replaces vllm-project#3512. More work is needed to address remaining inconsistencies in tokenization behaviour between the OpenAI front-end and standalone LLMEngine/AsyncLLMEngine use, including: - Standalone cases don't honor truncation and add_special_tokens request parameters - OpenAI API cases don't make use of TokenizerGroups for possible parallelization of tokenization As well as some other inefficiencies. But these are to be addressed in follow-on PRs.

njhill · 2024-07-08T22:45:15Z

@dtrifiro FYI

DarkLight1337

Left some comments!

If possible, try to add a test case to verify whether the OpenAI-compatible server can apply the correct chat template.

tests/async_engine/test_chat_template.py

vllm/entrypoints/openai/serving_chat.py

vllm/transformers_utils/tokenizer.py

njhill · 2024-07-09T15:14:19Z

If possible, try to add a test case to verify whether the OpenAI-compatible server can apply the correct chat template.

Thanks @DarkLight1337, yes test is forthcoming.

…okenizer-take2 # Conflicts: # vllm/entrypoints/openai/serving_chat.py # vllm/entrypoints/openai/serving_completion.py # vllm/entrypoints/openai/serving_engine.py

DarkLight1337 · 2024-07-11T02:38:49Z

We can merge after the test case is added.

…okenizer-take2 # Conflicts: # tests/async_engine/test_chat_template.py # tests/entrypoints/openai/test_completion.py # vllm/entrypoints/openai/serving_chat.py # vllm/entrypoints/openai/serving_completion.py

…okenizer-take2 # Conflicts: # tests/entrypoints/openai/test_chat.py # tests/entrypoints/openai/test_completion.py # tests/entrypoints/openai/test_tokenization.py

njhill · 2024-07-18T00:25:13Z

@DarkLight1337 tests now added! and there were a lot of merge conflicts to resolve 😅

DarkLight1337

Just a small question.

tests/entrypoints/openai/test_chat.py

DarkLight1337 · 2024-07-18T03:03:25Z

The error occurs because the ~~tokenizer~~ HuggingFace config is passed to the lru_cache from _image_token_str function, so we need to remove the lru_cache. To compensate, can a cached_property be added to CachedTokenizer so that we can avoid decoding the image token each time?

njhill · 2024-07-18T04:02:50Z

Thanks @DarkLight1337! I pushed a fix before I read your comment, hopefully this should be ok too.

vllm/entrypoints/openai/chat_utils.py

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Alvant <alvasian@yandex.ru>

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

njhill requested a review from DarkLight1337 July 8, 2024 22:42

Fix tests

cc536cd

DarkLight1337 reviewed Jul 9, 2024

View reviewed changes

tests/async_engine/test_chat_template.py Show resolved Hide resolved

vllm/entrypoints/openai/serving_chat.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/serving_chat.py Show resolved Hide resolved

vllm/transformers_utils/tokenizer.py Show resolved Hide resolved

njhill added 2 commits July 10, 2024 11:21

Merge remote-tracking branch 'refs/remotes/origin/main' into openai-t…

4668c87

…okenizer-take2 # Conflicts: # vllm/entrypoints/openai/serving_chat.py # vllm/entrypoints/openai/serving_completion.py # vllm/entrypoints/openai/serving_engine.py

Address comments from @DarkLight1337

6db473a

DarkLight1337 approved these changes Jul 11, 2024

View reviewed changes

njhill mentioned this pull request Jul 12, 2024

Fix use of LoRA tokenizers opendatahub-io/vllm-tgis-adapter#40

Merged

njhill added 5 commits July 16, 2024 12:21

test wip

eef9a8c

Merge remote-tracking branch 'refs/remotes/origin/main' into openai-t…

197f2cb

…okenizer-take2 # Conflicts: # tests/async_engine/test_chat_template.py # tests/entrypoints/openai/test_completion.py # vllm/entrypoints/openai/serving_chat.py # vllm/entrypoints/openai/serving_completion.py

Fixes and test updates

abdd2f9

Merge remote-tracking branch 'refs/remotes/origin/main' into openai-t…

0d2ff43

…okenizer-take2 # Conflicts: # tests/entrypoints/openai/test_chat.py # tests/entrypoints/openai/test_completion.py # tests/entrypoints/openai/test_tokenization.py

yapf

9b98068

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 18, 2024

DarkLight1337 reviewed Jul 18, 2024

View reviewed changes

tests/entrypoints/openai/test_chat.py Show resolved Hide resolved

DarkLight1337 enabled auto-merge (squash) July 18, 2024 00:55

DarkLight1337 disabled auto-merge July 18, 2024 02:59

Fix image token string caching

89c96e9

DarkLight1337 reviewed Jul 18, 2024

View reviewed changes

vllm/entrypoints/openai/chat_utils.py Outdated Show resolved Hide resolved

Fix import

e0c5e39

DarkLight1337 reviewed Jul 18, 2024

View reviewed changes

vllm/entrypoints/openai/chat_utils.py Outdated Show resolved Hide resolved

Format

17f0b63

DarkLight1337 merged commit e2fbaee into vllm-project:main Jul 18, 2024
72 checks passed

DarkLight1337 mentioned this pull request Jul 18, 2024

[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server #3512

Closed

DarkLight1337 mentioned this pull request Jul 18, 2024

[Frontend] Refactor prompt processing #4028

Merged

njhill deleted the openai-tokenizer-take2 branch July 19, 2024 00:19

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 19, 2024

[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (vllm-project#6227)

1a1ffe8

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (vllm-project#6227)

be7f882

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

gnpinkert pushed a commit to gnpinkert/vllm that referenced this pull request Jul 26, 2024

[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (vllm-project#6227)

e7f7549

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (vllm-project#6227)

cfeedde

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Alvant <alvasian@yandex.ru>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (vllm-project#6227)

28ab106

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs #6227

[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs #6227

njhill commented Jul 8, 2024

njhill commented Jul 8, 2024

DarkLight1337 left a comment

njhill commented Jul 9, 2024

DarkLight1337 commented Jul 11, 2024

njhill commented Jul 18, 2024

DarkLight1337 left a comment

DarkLight1337 commented Jul 18, 2024 •

edited

Loading

njhill commented Jul 18, 2024

[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs #6227

[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs #6227

Conversation

njhill commented Jul 8, 2024

njhill commented Jul 8, 2024

DarkLight1337 left a comment

Choose a reason for hiding this comment

njhill commented Jul 9, 2024

DarkLight1337 commented Jul 11, 2024

njhill commented Jul 18, 2024

DarkLight1337 left a comment

Choose a reason for hiding this comment

DarkLight1337 commented Jul 18, 2024 • edited Loading

njhill commented Jul 18, 2024

DarkLight1337 commented Jul 18, 2024 •

edited

Loading