Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[V1] Logprobs and prompt logprobs support #9880

Merged
merged 535 commits into from
Feb 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
535 commits
Select commit Hold shift + click to select a range
9a28ddf
updated
robertgshaw2-redhat Jan 3, 2025
d1a956d
update comment
robertgshaw2-redhat Jan 3, 2025
5fd0060
updated
robertgshaw2-redhat Jan 3, 2025
433b93c
merge
robertgshaw2-redhat Jan 3, 2025
0d2f7c8
stash
robertgshaw2-redhat Jan 3, 2025
06b9aba
cleanup
robertgshaw2-redhat Jan 3, 2025
035e2c2
updated
robertgshaw2-redhat Jan 3, 2025
17e41c8
remove
robertgshaw2-redhat Jan 3, 2025
2cb4832
finish cleaning sampler.py
robertgshaw2-redhat Jan 3, 2025
92595a4
updated
robertgshaw2-redhat Jan 3, 2025
c82fc85
updated comment
robertgshaw2-redhat Jan 3, 2025
c3c4f9c
passing mypy!
robertgshaw2-redhat Jan 3, 2025
fec3d15
comment
robertgshaw2-redhat Jan 3, 2025
d002d67
todo -> fixme
robertgshaw2-redhat Jan 3, 2025
3157e8b
updated
robertgshaw2-redhat Jan 3, 2025
60125e3
fixed sampler bug
afeldman-nm Jan 4, 2025
5908cb1
fixed some sampler bugs
afeldman-nm Jan 5, 2025
c5f9565
merge
afeldman-nm Jan 5, 2025
fc52031
wip fixing detokenizer test
afeldman-nm Jan 5, 2025
7dc2756
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
6e57de4
wip
afeldman-nm Jan 6, 2025
599aae8
temporary hack to use pickling
afeldman-nm Jan 6, 2025
2aa1007
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
ae1e1b7
wip detokenizer test
afeldman-nm Jan 6, 2025
ae00145
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
a1c5b2e
fix: logprobs not being wrapped in an array
afeldman-nm Jan 6, 2025
7288370
sample logprobs work
afeldman-nm Jan 6, 2025
85e57d9
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
0e90ccb
detokenizer test passing for sample logprobs
afeldman-nm Jan 6, 2025
c2f48fb
detokenizer tests passing
afeldman-nm Jan 6, 2025
7993d08
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
13177d4
prompt logprobs with chunked prefill!
afeldman-nm Jan 6, 2025
05536f5
cleanup
afeldman-nm Jan 6, 2025
fa64529
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
0d17df8
light refactor
afeldman-nm Jan 6, 2025
f707191
torch serialization with msgpack via enc_/ext_hooksgit status!
afeldman-nm Jan 6, 2025
637c45c
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 7, 2025
cd5e7c6
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 8, 2025
8b1b995
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 8, 2025
3d00348
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 8, 2025
ce4f081
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 8, 2025
62d648a
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 9, 2025
3546639
wip
afeldman-nm Jan 9, 2025
a8c0167
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 9, 2025
0bba8f9
Merge branch 'v1_logprobs' into v1_logprobs_prompt
afeldman-nm Jan 9, 2025
69218ab
GPU returns num_prompt_logprobs + 1 prompt logprobs
afeldman-nm Jan 9, 2025
2505244
now prompt logprobs include prompt token
afeldman-nm Jan 9, 2025
e1058ac
wip making prompt logprobs line up with tok ids
afeldman-nm Jan 9, 2025
5f33902
partial req peek token
afeldman-nm Jan 9, 2025
199a834
refactoring
afeldman-nm Jan 9, 2025
879fc44
refactoring; non-blocking cpu->gpu transfer
afeldman-nm Jan 9, 2025
0f425fe
wip detokenizer tests
afeldman-nm Jan 9, 2025
1089127
detok test fix
afeldman-nm Jan 9, 2025
d2742d8
passing detok tests
afeldman-nm Jan 9, 2025
cf28c9b
Merge branch 'main' into v1_logprobs
afeldman-nm Jan 9, 2025
749be5a
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 10, 2025
a55e679
LLMEngine test working, wip AsyncLLM test
afeldman-nm Jan 10, 2025
b2c0c95
reverted unwanted changes
afeldman-nm Jan 10, 2025
9a40c5f
success
afeldman-nm Jan 10, 2025
ca94fd4
Merge branch 'main' into v1_logprobs_apc_merge
afeldman-nm Jan 10, 2025
465d984
added test_completion, switched model
afeldman-nm Jan 12, 2025
1f19724
wip test_completion
afeldman-nm Jan 12, 2025
33d3922
merge
afeldman-nm Jan 12, 2025
4ed0994
Merge branch 'v1_logprobs' into v1_logprobs_apc_merge
afeldman-nm Jan 12, 2025
33093ee
Merge branch 'main' into afeldman-nm/v1_logprobs
robertgshaw2-redhat Jan 13, 2025
435bb15
updated
robertgshaw2-redhat Jan 13, 2025
c996901
sort of fixed RequestState cyclical import; added logprobs, prompt_lo…
afeldman-nm Jan 14, 2025
ba9561a
actually fixed RequestState circular import
afeldman-nm Jan 14, 2025
34735be
woops
afeldman-nm Jan 14, 2025
6a501eb
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 14, 2025
49c2c8c
wip
afeldman-nm Jan 14, 2025
016e747
untested first-pass at logprobs integration into new output processin…
afeldman-nm Jan 15, 2025
b269a7a
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 15, 2025
cda2ba2
wip
afeldman-nm Jan 15, 2025
bf20f4b
passing with no sample/prompt logprobs
afeldman-nm Jan 15, 2025
4fae200
fix to get prompt logprobs tests passing (sample logprobs tests alrea…
afeldman-nm Jan 15, 2025
9deca70
sample and prompt logprobs optional in EngineCoreOutput; makes detoke…
afeldman-nm Jan 15, 2025
d96ec24
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 15, 2025
46e65ae
wip
afeldman-nm Jan 15, 2025
65b9b64
refactored output processor test vectors into utils and test fixtures
afeldman-nm Jan 15, 2025
6ddc4f9
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 15, 2025
8dad984
refactored test fixtures
afeldman-nm Jan 15, 2025
789d0a4
merge
afeldman-nm Jan 15, 2025
29f491f
format
afeldman-nm Jan 15, 2025
3302eae
format
afeldman-nm Jan 15, 2025
fb3c836
Merge branch 'v1_logprobs_apc' into v1_logprobs_test_merge
afeldman-nm Jan 15, 2025
110afd1
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 15, 2025
a6ecea4
mock engine includes logprobs
afeldman-nm Jan 15, 2025
c12f29a
progress integrating logprobs into output processor tests
afeldman-nm Jan 15, 2025
29dd713
non-logprobs output processor tests pass
afeldman-nm Jan 15, 2025
29f77e3
output processor tests passing without logprobs checks
afeldman-nm Jan 15, 2025
18a2162
added logprobs test; detokenizer test is just detokenizer
afeldman-nm Jan 15, 2025
2648a05
merge
afeldman-nm Jan 15, 2025
ab40e32
Merge branch 'v1_logprobs_proc_test' into v1_logprobs
afeldman-nm Jan 15, 2025
e16ea40
output processor tests almost finished
afeldman-nm Jan 15, 2025
89f7977
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 15, 2025
3e23b32
Merge branch 'v1_logprobs_merge' into v1_logprobs
afeldman-nm Jan 15, 2025
cf92387
Merge branch 'v1_logprobs_proc_test' into v1_logprobs
afeldman-nm Jan 15, 2025
c8fc3c3
wip
afeldman-nm Jan 16, 2025
bd2c36b
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 16, 2025
63d4484
_validate_logprobs progress
afeldman-nm Jan 16, 2025
4353f01
enhanced logprobs checks
afeldman-nm Jan 16, 2025
1c418a6
wip
afeldman-nm Jan 17, 2025
ab95d87
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 17, 2025
80d420d
Merge branch 'v1_logprobs' into v1_logprobs_proc_test_merge
afeldman-nm Jan 17, 2025
1a5850b
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 17, 2025
c554c5c
merge
afeldman-nm Jan 20, 2025
c24cfd6
cleanup
afeldman-nm Jan 20, 2025
0a8f9ae
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 22, 2025
201c1cd
Update vllm/v1/engine/detokenizer.py
afeldman-nm Jan 22, 2025
832d5c8
Update vllm/v1/engine/detokenizer.py
afeldman-nm Jan 22, 2025
1a59237
detokenize()
afeldman-nm Jan 22, 2025
a49ab7e
Update vllm/v1/sample/sampler.py
afeldman-nm Jan 22, 2025
2bf6829
removed unnecessary lines from Scheduler; array_like in EngineCoreOutput
afeldman-nm Jan 22, 2025
c008732
Merge branch 'afeldman-nm/v1_logprobs' of https://github.com/neuralma…
afeldman-nm Jan 22, 2025
bfce1d6
tuples
afeldman-nm Jan 22, 2025
982381d
redundant else's
afeldman-nm Jan 22, 2025
aedc1b8
Update vllm/v1/worker/gpu_input_batch.py
afeldman-nm Jan 22, 2025
6c0fe71
Merge branch 'afeldman-nm/v1_logprobs' of https://github.com/neuralma…
afeldman-nm Jan 22, 2025
f5f5954
tuple
afeldman-nm Jan 22, 2025
4480ec0
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 22, 2025
6adaa1f
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 22, 2025
ef2d33a
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 22, 2025
d09604e
merge
afeldman-nm Jan 22, 2025
f1d0234
modified reference detokenization impl
afeldman-nm Jan 22, 2025
135a585
don't use decode()
afeldman-nm Jan 22, 2025
753d7c7
don't use decode for prompt logprobs
afeldman-nm Jan 22, 2025
207b802
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 22, 2025
c0033a4
refactor
afeldman-nm Jan 22, 2025
a20fa58
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 23, 2025
c6b87b1
Update vllm/v1/worker/gpu_model_runner.py
afeldman-nm Jan 24, 2025
bdb1dbe
naive prompt logprobs fix doesn't work
afeldman-nm Jan 24, 2025
b2a779f
new prompt logprobs approach
afeldman-nm Jan 24, 2025
3d8c7fd
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 24, 2025
4435b4b
Merge branch 'v1_logprobs' into v1_logprobs_repeek
afeldman-nm Jan 24, 2025
ef233c4
Integrated next-chunk peek into input_ids
afeldman-nm Jan 24, 2025
e1a9c51
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 24, 2025
fde5c1a
Merge branch 'v1_logprobs' into v1_logprobs_repeek
afeldman-nm Jan 24, 2025
091c2e1
remove unnecessary code; refactor
afeldman-nm Jan 24, 2025
2e98cf0
Merge branch 'afeldman-nm/v1_logprobs' of https://github.com/neuralma…
afeldman-nm Jan 24, 2025
c83529e
Merge branch 'v1_logprobs_repeek' into v1_logprobs
afeldman-nm Jan 24, 2025
f9d9eb9
fixing lint failures
afeldman-nm Jan 24, 2025
0a012f3
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 24, 2025
0526a01
Merge branch 'v1_logprobs_merge' into v1_logprobs
afeldman-nm Jan 24, 2025
e33d8bb
partial_req_ids -> partial_req_id
afeldman-nm Jan 24, 2025
3562aec
partial_req_ids -> partial_req_id
afeldman-nm Jan 24, 2025
5eb3aa0
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 24, 2025
8564e79
bugfix
afeldman-nm Jan 24, 2025
7d0d6d8
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 24, 2025
3f8edfe
non-blocking copy to cpu
afeldman-nm Jan 24, 2025
f38a17c
Rework output processor logic
njhill Jan 24, 2025
ea2a005
Fix test
njhill Jan 26, 2025
1fc08f2
Merge branch 'afeldman-nm/v1_logprobs' of https://github.com/neuralma…
afeldman-nm Jan 27, 2025
955953d
fix basic import issue
afeldman-nm Jan 27, 2025
e6edcbe
cleanup
afeldman-nm Jan 28, 2025
94e120f
cleanup
afeldman-nm Jan 28, 2025
1c24f91
Update vllm/v1/worker/gpu_model_runner.py
afeldman-nm Jan 28, 2025
dd96496
cleanup
afeldman-nm Jan 28, 2025
7a44291
Merge remote-tracking branch 'origin/main' into afeldman-nm/v1_logprobs
njhill Jan 28, 2025
0ca162a
delta mode fix
afeldman-nm Jan 28, 2025
fe56625
WIP add ranks etc.
njhill Jan 29, 2025
3efb2df
more encapsulated prompt logprobs approach
afeldman-nm Jan 29, 2025
9bfccb8
fixes
afeldman-nm Jan 29, 2025
1b7fe30
pythonized engine core logprobs
afeldman-nm Jan 29, 2025
b812d17
merging serialization changes
afeldman-nm Jan 29, 2025
e0e0708
logprob ranks work
afeldman-nm Jan 29, 2025
c609a3d
refactor
afeldman-nm Jan 29, 2025
aaea609
merge
afeldman-nm Jan 29, 2025
b0a0451
0 logprobs test
afeldman-nm Jan 29, 2025
fb5add1
Merge remote-tracking branch 'origin/main' into afeldman-nm/v1_logprobs
njhill Jan 29, 2025
1d505d4
zero fix is probably in
afeldman-nm Jan 29, 2025
e73af7b
zero fix is almost in; updated logprobs test cases
afeldman-nm Jan 29, 2025
17b21ac
Merge branch 'afeldman-nm/v1_logprobs' of https://github.com/neuralma…
afeldman-nm Jan 29, 2025
fc79dfa
zero issue seems to be fixed
afeldman-nm Jan 30, 2025
2e05530
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 30, 2025
bcfa1d6
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 30, 2025
34b20f4
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 30, 2025
1e730ad
Clean-up; simplify Logprobs dict construction
njhill Jan 30, 2025
d797cbf
wip
afeldman-nm Jan 30, 2025
891604e
Updated logprobs processor unit tests to reflect new engine core outp…
afeldman-nm Jan 30, 2025
2d4a96e
Merge branch 'afeldman-nm/v1_logprobs' of https://github.com/neuralma…
afeldman-nm Jan 30, 2025
a9ecdf9
bugfix
afeldman-nm Jan 31, 2025
bec83f5
fixed test vector bug
afeldman-nm Jan 31, 2025
d663462
rank computation fix
afeldman-nm Jan 31, 2025
ce7c38c
wip
afeldman-nm Jan 31, 2025
f186fe3
reverting
afeldman-nm Jan 31, 2025
7c4b089
reverting
afeldman-nm Jan 31, 2025
5129c20
reverting
afeldman-nm Jan 31, 2025
cc8ce98
revert complete
afeldman-nm Jan 31, 2025
de94a16
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 31, 2025
ff3122a
fixed serialization bug
afeldman-nm Jan 31, 2025
2ca6b03
stop fix
afeldman-nm Jan 31, 2025
f559431
acknowledge broken invariant
afeldman-nm Jan 31, 2025
fe053b0
Merge branch 'v1_logprobs' into v1_logprobs_merge
afeldman-nm Jan 31, 2025
2896276
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 31, 2025
f61cf5c
fixed typing issue
afeldman-nm Jan 31, 2025
ad25c6d
woops zero logprob fix
afeldman-nm Jan 31, 2025
f934094
additional zero logprob fix
afeldman-nm Jan 31, 2025
bc92a80
merge
afeldman-nm Jan 31, 2025
696f890
small fix
afeldman-nm Jan 31, 2025
8f209f2
Merge branch 'v1_logprobs_apc' into v1_logprobs_test
afeldman-nm Jan 31, 2025
e74c0e4
echo tests pass
afeldman-nm Jan 31, 2025
21428e3
Simplify Logprobs dict construction
njhill Feb 1, 2025
c8ae49b
nit
robertgshaw2-redhat Feb 2, 2025
5883a70
revert changes
robertgshaw2-redhat Feb 2, 2025
c180b37
formats
robertgshaw2-redhat Feb 2, 2025
7c8e13d
update
robertgshaw2-redhat Feb 2, 2025
6486cdd
update comment
robertgshaw2-redhat Feb 2, 2025
22b07e2
update
robertgshaw2-redhat Feb 2, 2025
e7ba970
revert unnessary change
robertgshaw2-redhat Feb 2, 2025
8c57385
cleanup suprious change
robertgshaw2-redhat Feb 2, 2025
b0af03b
cleanup suprious change
robertgshaw2-redhat Feb 2, 2025
3be08c8
simplify update sample logprobs logic
robertgshaw2-redhat Feb 2, 2025
d9dc980
mypy:)
robertgshaw2-redhat Feb 2, 2025
36b9a36
share more logic between sample and prompt logprobs
robertgshaw2-redhat Feb 2, 2025
4658be3
updated
robertgshaw2-redhat Feb 2, 2025
9ee7ac3
updated
robertgshaw2-redhat Feb 2, 2025
4d5f444
stash
robertgshaw2-redhat Feb 2, 2025
1f1f49a
updated
robertgshaw2-redhat Feb 2, 2025
b9632e2
remove
robertgshaw2-redhat Feb 2, 2025
52dd142
fail if prefix caching is enabled
robertgshaw2-redhat Feb 2, 2025
76a0324
mypy
robertgshaw2-redhat Feb 2, 2025
2448997
stash
robertgshaw2-redhat Feb 2, 2025
1fec265
updated
robertgshaw2-redhat Feb 2, 2025
8c1b89e
merge
afeldman-nm Feb 2, 2025
29da400
fix
robertgshaw2-redhat Feb 2, 2025
f70661f
Merge branch 'v1_logprobs_test_merge' into v1_logprobs
afeldman-nm Feb 2, 2025
1d19700
revert
robertgshaw2-redhat Feb 2, 2025
50b7660
updated
robertgshaw2-redhat Feb 2, 2025
2c0d7f3
Merge branch 'main' into afeldman-nm/v1_logprobs
robertgshaw2-redhat Feb 2, 2025
56aa97f
updated with header
robertgshaw2-redhat Feb 2, 2025
5cc977a
fix pref commit
robertgshaw2-redhat Feb 2, 2025
73653f1
missing test
robertgshaw2-redhat Feb 2, 2025
4c3ca35
revert msgpack
robertgshaw2-redhat Feb 2, 2025
e13e027
reinstate msgpack tensor hooks for now
njhill Feb 3, 2025
2f00116
Group logprobs tensors into tuple, some other simplifications
njhill Feb 3, 2025
5f35e44
Fix prompt_logprobs behavior for RequestOutputKind.CUMULATIVE
njhill Feb 3, 2025
ff3ad91
Merge remote-tracking branch 'origin/main' into afeldman-nm/v1_logprobs
njhill Feb 4, 2025
a12aa1c
Further simplification, group logprob lists into a tuple
njhill Feb 4, 2025
e7e664c
Remove redundant partial request tracking
njhill Feb 4, 2025
e09f29f
Merge remote-tracking branch 'origin/main' into afeldman-nm/v1_logprobs
njhill Feb 4, 2025
a5819bb
fix test
njhill Feb 5, 2025
0015f3f
Merge remote-tracking branch 'origin/main' into afeldman-nm/v1_logprobs
njhill Feb 5, 2025
fcbb27f
minor cleanup
njhill Feb 5, 2025
f0c8f28
Merge remote-tracking branch 'origin/main' into afeldman-nm/v1_logprobs
njhill Feb 5, 2025
054562d
move RequestState back to output_processor.py
njhill Feb 5, 2025
e105532
fix tests
njhill Feb 5, 2025
a006d17
fix and speed up tests
njhill Feb 5, 2025
ec98a2c
Merge remote-tracking branch 'origin/main' into afeldman-nm/v1_logprobs
njhill Feb 6, 2025
7973627
Merge remote-tracking branch 'origin/main' into afeldman-nm/v1_logprobs
njhill Feb 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions tests/v1/core/test_scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,8 +195,8 @@ def test_schedule_partial_requests():
req_ids=[request.request_id for request in requests],
req_id_to_index=req_to_index,
sampled_token_ids=[0] * len(requests),
logprob_token_ids_cpu=None,
logprobs_cpu=None,
logprobs=None,
prompt_logprobs_dict={},
)
scheduler.update_from_output(output, model_runner_output)

Expand Down
90 changes: 90 additions & 0 deletions tests/v1/engine/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# SPDX-License-Identifier: Apache-2.0

from typing import List, Tuple

import pytest
import torch
from transformers import AutoTokenizer

from tests.v1.engine.utils import (NUM_PROMPT_LOGPROBS_UNDER_TEST,
NUM_SAMPLE_LOGPROBS_UNDER_TEST, PROMPT_LEN,
TOKENIZER_NAME,
DummyOutputProcessorTestVectors,
generate_dummy_prompt_logprobs_tensors,
generate_dummy_sample_logprobs)
from vllm.engine.arg_utils import EngineArgs
from vllm.transformers_utils.tokenizer_group import init_tokenizer_from_configs

from tests.v1.engine.utils import FULL_STRINGS # isort: skip

EngineCoreSampleLogprobsType = List[Tuple[torch.Tensor, torch.Tensor]]
EngineCorePromptLogprobsType = Tuple[torch.Tensor, torch.Tensor]


def _build_test_vectors_no_logprobs() -> DummyOutputProcessorTestVectors:
"""Generate output processor dummy test vectors, without logprobs

Returns:
DummyOutputProcessorTestVectors instance with no logprobs
"""

tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_NAME)
vllm_config = EngineArgs(model=TOKENIZER_NAME).create_engine_config()
# Tokenize prompts under test & create dummy generated tokens
prompt_tokens = [
tokenizer(text).input_ids[:PROMPT_LEN] for text in FULL_STRINGS
]
generation_tokens = [
tokenizer(text).input_ids[PROMPT_LEN:] for text in FULL_STRINGS
]
# Generate prompt strings
prompt_strings = [
tokenizer.decode(prompt_tokens, skip_special_tokens=True)
for prompt_tokens in prompt_tokens
]
prompt_strings_len = [
len(prompt_string) for prompt_string in prompt_strings
]
return DummyOutputProcessorTestVectors(
tokenizer=tokenizer,
tokenizer_group=init_tokenizer_from_configs(
vllm_config.model_config, vllm_config.scheduler_config,
vllm_config.parallel_config, vllm_config.lora_config),
vllm_config=vllm_config,
full_tokens=[tokenizer(text).input_ids for text in FULL_STRINGS],
prompt_tokens=prompt_tokens,
generation_tokens=generation_tokens,
prompt_strings=prompt_strings,
prompt_strings_len=prompt_strings_len,
generation_strings=[
text[prompt_len:]
for text, prompt_len in zip(FULL_STRINGS, prompt_strings_len)
],
prompt_logprobs=[],
generation_logprobs=[])


@pytest.fixture
def dummy_test_vectors() -> DummyOutputProcessorTestVectors:
"""Generate output processor dummy test vectors, with logprobs

Returns:
DummyOutputProcessorTestVectors instance with logprobs
"""
# Build dummy test vectors without logprobs
dtv = _build_test_vectors_no_logprobs()
# Inject logprobs into dummy test vectors
# data structure
dtv.generation_logprobs = [
generate_dummy_sample_logprobs(
sampled_tokens_list=tokens_list,
num_logprobs=NUM_SAMPLE_LOGPROBS_UNDER_TEST,
tokenizer=dtv.tokenizer) for tokens_list in dtv.generation_tokens
]
dtv.prompt_logprobs = [
generate_dummy_prompt_logprobs_tensors(
prompt_tokens_list=tokens_list,
num_logprobs=NUM_PROMPT_LOGPROBS_UNDER_TEST,
tokenizer=dtv.tokenizer) for tokens_list in dtv.prompt_tokens
]
return dtv
49 changes: 45 additions & 4 deletions tests/v1/engine/test_async_llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@

import asyncio
from contextlib import ExitStack
from typing import List, Tuple
from typing import List, Optional, Tuple

import pytest

from tests.v1.engine.utils import PLP_APC_UNSUPPORTED_MSG
from vllm import SamplingParams
from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.platforms import current_platform
Expand All @@ -21,13 +22,19 @@
disable_log_requests=True)


async def generate(engine: AsyncLLM, request_id: str,
async def generate(engine: AsyncLLM,
request_id: str,
output_kind: RequestOutputKind,
max_tokens: int) -> Tuple[int, str]:
max_tokens: int,
prompt_logprobs: Optional[int] = None) -> Tuple[int, str]:
# Ensure generate doesn't complete too fast for cancellation test.
await asyncio.sleep(0.2)

count = 0
sampling_params = SamplingParams(max_tokens=max_tokens,
output_kind=output_kind,
temperature=0)
temperature=0,
prompt_logprobs=prompt_logprobs)
async for out in engine.generate(request_id=request_id,
prompt="Hello my name is Robert and",
sampling_params=sampling_params):
Expand All @@ -43,6 +50,40 @@ async def generate(engine: AsyncLLM, request_id: str,
return count, request_id


@pytest.mark.parametrize(
"output_kind", [RequestOutputKind.DELTA, RequestOutputKind.FINAL_ONLY])
@pytest.mark.asyncio
async def test_async_llm_refuses_prompt_logprobs_with_apc(
monkeypatch, output_kind: RequestOutputKind):
"""Test passes if AsyncLLM raises an exception when it is configured
for automatic prefix caching and it receives a request with
prompt_logprobs enabled, which is incompatible."""
# TODO(rickyx): Remove monkeypatch VLLM_USE_V1 setting once we have a
# better way to test V1 so that in the future when we switch, we don't
# have to change all the tests.
monkeypatch.setenv("VLLM_USE_V1", "1")
# Create AsyncLLM engine with APC
apc_engine_args = AsyncEngineArgs(model="facebook/opt-125m",
enable_prefix_caching=True,
gpu_memory_utilization=0.8,
disable_log_requests=True)
engine = AsyncLLM.from_engine_args(apc_engine_args)
try:
with pytest.raises(ValueError) as excinfo:
# Issue a request with prompt logprobs enabled, which should fail
await asyncio.create_task(
generate(engine,
"request-0",
output_kind,
10,
prompt_logprobs=5))
# Validate exception string is correct
assert str(excinfo.value) == PLP_APC_UNSUPPORTED_MSG
finally:
# Shut down engine
engine.shutdown()


@pytest.mark.parametrize(
"output_kind", [RequestOutputKind.DELTA, RequestOutputKind.FINAL_ONLY])
@pytest.mark.asyncio
Expand Down
23 changes: 23 additions & 0 deletions tests/v1/engine/test_llm_engine.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# SPDX-License-Identifier: Apache-2.0

import pytest

from tests.v1.engine.utils import PLP_APC_UNSUPPORTED_MSG
from vllm import LLM, SamplingParams


def test_llm_engine_refuses_prompt_logprobs_with_apc(monkeypatch):
"""Test passes if LLMEngine raises an exception when it is configured
for automatic prefix caching and it receives a request with
prompt_logprobs enabled, which is incompatible."""

monkeypatch.setenv("VLLM_USE_V1", "1")
# TODO(nick): Single-proc to work around a ZMQ shutdown hang for now.
monkeypatch.setenv("VLLM_ENABLE_V1_MULTIPROCESSING", "0")
with pytest.raises(ValueError) as excinfo:
LLM(model="facebook/opt-125m", enable_prefix_caching=True).generate(
"Hello, my name is",
SamplingParams(temperature=0.8, top_p=0.95, prompt_logprobs=5))

# Validate exception string is correct
assert str(excinfo.value) == PLP_APC_UNSUPPORTED_MSG
Loading