Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) #4942

Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
624 commits
Select commit Hold shift + click to select a range
10ed714
Format
afeldman-nm Jul 15, 2024
78d3d3c
modified LLM.generate() error message
afeldman-nm Jul 15, 2024
6c95380
wip engine is_encoder_decoder() setting
afeldman-nm Jul 15, 2024
304caed
formatting
afeldman-nm Jul 15, 2024
7b0803b
formatting?
afeldman-nm Jul 15, 2024
5525511
Sequence may be constructed with encoder/decoder LLMInput configurations
afeldman-nm Jul 15, 2024
dd4031c
wip but having wllm.commit_id error
afeldman-nm Jul 15, 2024
8dccaa5
correctly constructing enc/dec sequences
afeldman-nm Jul 15, 2024
336a77d
formatting
afeldman-nm Jul 15, 2024
46397c7
wip
afeldman-nm Jul 15, 2024
f85997b
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 15, 2024
251f899
wip
afeldman-nm Jul 15, 2024
9141347
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 15, 2024
ddaf0ad
wip
afeldman-nm Jul 16, 2024
54ff142
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 16, 2024
92d9f48
conftest: encoder/decoder example prompts
afeldman-nm Jul 16, 2024
c5846ac
Hfrunner greedy logprobs limit
afeldman-nm Jul 16, 2024
374880f
input preparation now includes encoder-oriented input setup:
afeldman-nm Jul 16, 2024
796d7a3
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 16, 2024
42ac66b
VllmRunner encoder/decoder methods
afeldman-nm Jul 16, 2024
850a97e
bart parallel vocab
afeldman-nm Jul 16, 2024
3c7e19d
zip enc/dec prompts; formatting
afeldman-nm Jul 16, 2024
e534ffc
wip
afeldman-nm Jul 16, 2024
97d81f0
encoder/decoder input processing; formatting
afeldman-nm Jul 16, 2024
87ed3b6
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 16, 2024
713d095
incorporated encoder sequence into request-add functionality
afeldman-nm Jul 16, 2024
aea8d34
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 17, 2024
159c7bc
fixed decoder-only bug
afeldman-nm Jul 17, 2024
16c9aa2
bugfix
afeldman-nm Jul 17, 2024
03aea18
wip
afeldman-nm Jul 17, 2024
ef80c85
wip
afeldman-nm Jul 17, 2024
f8dd4a5
fixed scheduler bug
afeldman-nm Jul 17, 2024
c2ff615
format
afeldman-nm Jul 17, 2024
31127fa
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 17, 2024
1c6e06d
bugfix
afeldman-nm Jul 17, 2024
0cc14ab
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 17, 2024
3656dc6
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 17, 2024
aee5f16
fixed sequence bug
afeldman-nm Jul 17, 2024
ef94623
added examples utils w/ context manager for backend override; applied…
afeldman-nm Jul 17, 2024
50ad5ff
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 17, 2024
b277180
formatting
afeldman-nm Jul 17, 2024
cac6283
added encoder/decoder example to examples test
afeldman-nm Jul 17, 2024
f54f276
wip refactoring
afeldman-nm Jul 17, 2024
597a07d
refactor
afeldman-nm Jul 17, 2024
9f5a02c
RequestOutput & SequenceGroup now include encoder prompt in output, a…
afeldman-nm Jul 17, 2024
94c904f
wip parallel bart but encountering GPU count issue
afeldman-nm Jul 17, 2024
9da8fb3
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 17, 2024
1f8c52f
tweaks to enc/dec example
afeldman-nm Jul 17, 2024
1808846
formatting
afeldman-nm Jul 17, 2024
f15eacf
wip
afeldman-nm Jul 17, 2024
6c940f8
modified HF behavior in BART test to be truly greedy
afeldman-nm Jul 17, 2024
949ac02
formatting
afeldman-nm Jul 17, 2024
88c058e
wip parallelizing BART
afeldman-nm Jul 17, 2024
31e335f
wip activation parallelization
afeldman-nm Jul 17, 2024
c092ed4
merged in upstream changes; left some formatting issues which I expec…
afeldman-nm Jul 17, 2024
d7bd617
Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_ru…
afeldman-nm Jul 17, 2024
69f0379
wip:
afeldman-nm Jul 17, 2024
9fdd047
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 17, 2024
584c01e
Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_…
afeldman-nm Jul 17, 2024
41ccf0c
wip merge
afeldman-nm Jul 20, 2024
ffa99b2
additional merge
afeldman-nm Jul 20, 2024
a22f56c
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 22, 2024
c00e0a8
CommonMetadataBuilder sets block_tables constructor arg of metadata
afeldman-nm Jul 22, 2024
32967c1
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 22, 2024
a33b501
Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_ru…
afeldman-nm Jul 22, 2024
a16cabb
equalized some generation/sampling config settings between enc/dec HF…
afeldman-nm Jul 22, 2024
abbb427
Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_ru…
afeldman-nm Jul 22, 2024
00198a6
BART MLPs parallelized
afeldman-nm Jul 22, 2024
fb3227f
parallelized BART learned positional embedding
afeldman-nm Jul 22, 2024
e5bb9de
all attention layer output linears are parallelized
afeldman-nm Jul 22, 2024
74abe22
encoder attention & decoder self-attention parallelized
afeldman-nm Jul 22, 2024
9bbed43
parallelized LM head
afeldman-nm Jul 22, 2024
fdf71de
parallelized enc/dec cross-attention, using a slight hack
afeldman-nm Jul 22, 2024
3551b6b
fixed bug where underlying Attention was constructed using full head-…
afeldman-nm Jul 22, 2024
b174c7a
bart is parallelized, modulo an unfortunate hack for QKVParallelLinea…
afeldman-nm Jul 22, 2024
c43a6ed
commented out BART TP=4
afeldman-nm Jul 22, 2024
b90b6b6
upstream merge
afeldman-nm Jul 22, 2024
14831b0
Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_…
afeldman-nm Jul 22, 2024
427032a
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 22, 2024
c51a168
fixed bug in how conftest was handling HF encoder/decoder outputs; di…
afeldman-nm Jul 23, 2024
b01937f
set up None/empty str tests which are not passing
afeldman-nm Jul 23, 2024
48a742d
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 23, 2024
b283544
Merge branch 'infra_enc_dec_model_runner_correctness' into infra_enc_…
afeldman-nm Jul 23, 2024
059273f
wip
afeldman-nm Jul 23, 2024
229847b
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 23, 2024
7e7bbd9
deleted unnecessary dependency
afeldman-nm Jul 23, 2024
4a6e39e
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 24, 2024
aa01d71
empty-string decoder input is now handled for encoder/decoder
afeldman-nm Jul 24, 2024
0b29fd2
enc/dec handles empty str and None decoder prompts correctly
afeldman-nm Jul 24, 2024
dd784b5
typing fix
afeldman-nm Jul 24, 2024
61d2ad2
fixed bugs in handling non-text formats for individual prompts
afeldman-nm Jul 24, 2024
f36ffb5
example includes prompt zipper
afeldman-nm Jul 24, 2024
c493d40
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 24, 2024
be58d8a
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 24, 2024
02114bd
_free_seq_group() -> _free_seq_group_cross_attn_blocks()
afeldman-nm Jul 24, 2024
5a270ff
refactoring
afeldman-nm Jul 24, 2024
ed4a56b
formatting
afeldman-nm Jul 24, 2024
4b5b2cf
removed unnecessary argument reordering
afeldman-nm Jul 24, 2024
d82b273
enc/dec example comments'
afeldman-nm Jul 24, 2024
0af58ec
responses to feedback
afeldman-nm Jul 24, 2024
bed9bcd
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 25, 2024
47b4eb2
fixed bug caused by upstream refactoring
afeldman-nm Jul 25, 2024
393515e
formatting
afeldman-nm Jul 25, 2024
fb5a2bc
upstream merge
afeldman-nm Jul 25, 2024
c2cc010
Removed lora from enc/dec model runner
afeldman-nm Jul 25, 2024
175ea95
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 25, 2024
3327e5b
removed lora & vision & mm code from enc/dec modelrunner
afeldman-nm Jul 25, 2024
47c5548
checked out examples/offline_inference.py from main
afeldman-nm Jul 25, 2024
1bb7ad9
updated RequestOutput docstring
afeldman-nm Jul 25, 2024
035d90d
updated RequestOutput docstring
afeldman-nm Jul 25, 2024
64685ac
Sequence docstring
afeldman-nm Jul 25, 2024
d1751db
removed flashinfer references from enc/dec modelrunner
afeldman-nm Jul 25, 2024
f0abcc2
format
afeldman-nm Jul 25, 2024
4bb7fc4
removed chunked prefill logic/docstring text from enc/dec modelrunner
afeldman-nm Jul 25, 2024
a936faa
removed prefix caching from enc/dec modelrunner
afeldman-nm Jul 25, 2024
7cdc1ca
updated _prepare_encoder_model_input_tensors docstring
afeldman-nm Jul 25, 2024
dc953e1
refactoring and cleanup
afeldman-nm Jul 25, 2024
d132a7f
formatting
afeldman-nm Jul 25, 2024
3938389
wip
afeldman-nm Jul 25, 2024
a77ef37
trimmed down enc/dec model runner
afeldman-nm Jul 26, 2024
b358767
Merge branch 'main' into infra_enc_dec_model_runner
afeldman-nm Jul 26, 2024
b5f102f
conftest refactor
afeldman-nm Jul 26, 2024
63c76d9
encoder/decoder model runner test docstrings
afeldman-nm Jul 26, 2024
0d35915
enc/dec model runner tests - added explanatory comments; removed unne…
afeldman-nm Jul 26, 2024
4a3fcec
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 26, 2024
456090c
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 26, 2024
080fec6
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 26, 2024
ef5d711
Added all encoder/decoder unsupported scenario checks except for inva…
afeldman-nm Jul 26, 2024
f12c53f
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 26, 2024
1a8b284
added logic to force backend selection without environment variable.
afeldman-nm Jul 26, 2024
5606345
wip removing context managers that force kernel choice for enc/dec
afeldman-nm Jul 26, 2024
8513260
Merge branch 'infra_enc_dec_model_runner_checks' into infra_enc_dec_m…
afeldman-nm Jul 26, 2024
e0a5880
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 26, 2024
783a416
missing import; formatting
afeldman-nm Jul 26, 2024
8aac5a3
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 27, 2024
5d9f15d
wip removing kernel override fixture
afeldman-nm Jul 27, 2024
e86f63c
fully removed/repaced backend forcing
afeldman-nm Jul 27, 2024
fe8a0dd
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 27, 2024
389d195
Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_…
afeldman-nm Jul 27, 2024
f081438
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 29, 2024
eab9e6b
added enc/dec not-impl err strings import
afeldman-nm Jul 29, 2024
f595517
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 29, 2024
7f53daf
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Jul 29, 2024
b2ac87c
Merge branch 'main' into infra_enc_dec_model_runner_review
abf149 Jul 29, 2024
47954c7
Merge branch 'main' into infra_enc_dec_model_runner_review
abf149 Jul 30, 2024
665e0c2
Merge branch 'main' into infra_enc_dec_model_runner_review
abf149 Jul 30, 2024
e48fdb0
upstream merge
abf149 Jul 31, 2024
6dc3fba
BART uses position argument
abf149 Jul 31, 2024
3656094
merge; fix typing issues
abf149 Jul 31, 2024
2558061
removed prompt adapter logic from enc/dec model runner
abf149 Jul 31, 2024
c744be2
removed redundant code from enc/dec modelrunner; fixed test comments
abf149 Jul 31, 2024
f94fea7
removed pipeline parallelism references
abf149 Jul 31, 2024
c448cc6
wip removing decoder-only stuff that should be singleton
abf149 Jul 31, 2024
87c34f4
removed examples/utils
abf149 Jul 31, 2024
98cd380
assorted fixes; corrected BART model output dimensionality
abf149 Jul 31, 2024
eb7b80e
formatting
abf149 Jul 31, 2024
d7b6c60
removed get_encoder()/get_decoder()
abf149 Jul 31, 2024
5c9d187
unnecessary imports
abf149 Jul 31, 2024
4e0cdd8
removed unnecessary code & comments
abf149 Jul 31, 2024
9fd0a88
Merge branch 'main' into infra_enc_dec_model_runner_review
abf149 Jul 31, 2024
eda6857
ran isort
abf149 Jul 31, 2024
390afa5
reorganized enc/dec model runner encoder inputs computation
abf149 Jul 31, 2024
4540cfe
formatting/typing
abf149 Jul 31, 2024
f4bf69e
removed redundant code for enc/dec MR test
abf149 Jul 31, 2024
6ce2c9d
Merge branch 'main' into infra_enc_dec_model_runner_review
abf149 Jul 31, 2024
ec728ab
wip
abf149 Jul 31, 2024
1e9fa39
Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_ru…
abf149 Jul 31, 2024
57397c1
Removed unnecessary code
abf149 Jul 31, 2024
61ba9dd
formatting; isort
abf149 Jul 31, 2024
5407769
Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_ru…
abf149 Jul 31, 2024
02fffd2
unnecessary import
abf149 Jul 31, 2024
7e1d3c3
Merge branch 'main' into infra_enc_dec_model_runner_review
abf149 Jul 31, 2024
2ab8a73
added type: ignore to ActorDiedError import because it was interferin…
abf149 Jul 31, 2024
fe4638e
restructured & reformatted llm engine enc/dec code
abf149 Jul 31, 2024
0c6cc90
improved explanatory comment.
abf149 Jul 31, 2024
af976d7
LLMEngine logic working
abf149 Jul 31, 2024
c8cbcb1
reorganized enc/dec llm engine logic
abf149 Jul 31, 2024
c305d16
format; refactor
abf149 Jul 31, 2024
c36f8d1
enc/dec decoder prompt preprocessing is now entirely list-oriented
abf149 Jul 31, 2024
ade461f
Merge branch 'main' into infra_enc_dec_model_runner_review
abf149 Jul 31, 2024
39e39cc
added decoder prompt processing comment
abf149 Jul 31, 2024
88bcae2
Merge branch 'main' into infra_enc_dec_model_runner_review
abf149 Jul 31, 2024
c09b4ed
enc/dec MR xformers log message
abf149 Jul 31, 2024
3a510b2
XFormers kernel forcing log message; enforce eager log message with w…
abf149 Jul 31, 2024
505b4ec
Merge branch 'main' into infra_enc_dec_model_runner_review
abf149 Jul 31, 2024
f279112
explanatory comment
abf149 Jul 31, 2024
25e3bbd
Merge branch 'main' into infra_enc_dec_model_runner_review
abf149 Jul 31, 2024
41d8262
Merge branch 'main' into infra_enc_dec_model_runner_review
abf149 Aug 1, 2024
593d439
fixed comments
abf149 Aug 1, 2024
b9941cd
fixed BART bug for None prompts
abf149 Aug 1, 2024
82a2ba6
moved BART-specific test config from conftest to test_bart
abf149 Aug 1, 2024
f713731
upstream merge
abf149 Aug 1, 2024
23fe370
Updated vLLM enforce_eager default behavior
abf149 Aug 1, 2024
47acc7e
LLM constructor comment explains enforce_eager defaults
abf149 Aug 1, 2024
f8ec35a
refactored out enc/dec mr helper function
abf149 Aug 1, 2024
8aa8665
added enforce_eager=True to list of valid-scenario requirements for e…
abf149 Aug 1, 2024
cf12aaf
small fix
abf149 Aug 1, 2024
1d6de48
enc/dec MR attempts to override backend but fails if already overridd…
abf149 Aug 1, 2024
fd6c59b
added error message for unsupported enc/dec + prompt adapter scenario
abf149 Aug 1, 2024
72c7c22
Merge branch 'main' into infra_enc_dec_model_runner_review
abf149 Aug 2, 2024
ead6527
added encoder/decoder distributed correctness test
abf149 Aug 2, 2024
dfaf9cf
enc dec changes in distributed test pipeline
afeldman-nm Aug 2, 2024
e16bbd8
enc/dec distributed test works
afeldman-nm Aug 2, 2024
7944b89
multi-GPU enc/dec test comment
afeldman-nm Aug 2, 2024
fda01b2
upstream merge
afeldman-nm Aug 3, 2024
e3715de
4 gpu enc/dec test
afeldman-nm Aug 3, 2024
f42d1f8
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Aug 3, 2024
6d7821d
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Aug 4, 2024
28bd8db
removed commented-out tp arg from enc/dec example
afeldman-nm Aug 4, 2024
dabfd76
'specified' typo
afeldman-nm Aug 4, 2024
61fc254
get_env_variable_attn_backend rename
afeldman-nm Aug 4, 2024
3425076
removed meaningless decode comment
afeldman-nm Aug 4, 2024
f93ca97
removed unnecessary comment fomr bart.py
afeldman-nm Aug 4, 2024
67aa90f
correct usage of decoder start token id
afeldman-nm Aug 4, 2024
7fa2177
handle all union type scenarios
afeldman-nm Aug 4, 2024
d1ca700
removed 4 GPU BART test
afeldman-nm Aug 4, 2024
d66892f
enc/dec mr tests used prepare_model_input()
afeldman-nm Aug 4, 2024
605b03c
streamlined enc/dec mr encoder model input prep
afeldman-nm Aug 4, 2024
539c812
bugfix: shape of empty cross-attn block table
afeldman-nm Aug 5, 2024
d209037
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Aug 5, 2024
528f2d0
format
afeldman-nm Aug 5, 2024
b3a8564
refactoring
afeldman-nm Aug 5, 2024
a9be976
sampling metadata simpliciation in test
afeldman-nm Aug 5, 2024
6d428a4
unnecessary import
afeldman-nm Aug 5, 2024
8eb5d11
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Aug 5, 2024
6f81d9d
upstream merge
afeldman-nm Aug 6, 2024
d132be4
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Aug 6, 2024
0756240
Update vllm/core/scheduler.py
afeldman-nm Aug 6, 2024
d1bb4c5
Update vllm/utils.py
afeldman-nm Aug 6, 2024
ba147a0
Update vllm/utils.py
afeldman-nm Aug 6, 2024
4049564
Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_ru…
afeldman-nm Aug 6, 2024
79847b5
conftest fix
afeldman-nm Aug 6, 2024
0b5bc8c
LLM Engine fixes
afeldman-nm Aug 6, 2024
50b4a4f
vllm inputs fixes
afeldman-nm Aug 6, 2024
d6d3e92
Additional LLMEngine fixes
afeldman-nm Aug 6, 2024
56ab71d
moved enc/dec + enforce eager default logic to model config
afeldman-nm Aug 6, 2024
8b58f3c
arg_utils cleanup
afeldman-nm Aug 6, 2024
f8ab672
iterating over dict items
afeldman-nm Aug 6, 2024
4c0d41c
formatting
afeldman-nm Aug 6, 2024
74a6a21
cleanup
afeldman-nm Aug 6, 2024
71456e7
wip
afeldman-nm Aug 6, 2024
9625468
last trailing commas
afeldman-nm Aug 6, 2024
2e31471
format
afeldman-nm Aug 6, 2024
60288d7
formatting
afeldman-nm Aug 6, 2024
1ccee00
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Aug 6, 2024
0fe9963
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Aug 6, 2024
27617e0
offline_inference_vision_language.py replaces llava test in pipeline
afeldman-nm Aug 6, 2024
8ce1aef
atol 1e-2 -> 2e-2 in test_flash_attn.py
afeldman-nm Aug 6, 2024
e6b4c16
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm Aug 6, 2024
8ecbbc5
Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_…
afeldman-nm Aug 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ steps:
- python3 llm_engine_example.py
- python3 llava_example.py
- python3 tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors
- python3 offline_inference_encoder_decoder.py

- label: Models Test # 1hr10min
source_file_dependencies:
Expand Down Expand Up @@ -289,6 +290,7 @@ steps:
commands:
- VLLM_TEST_SAME_HOST=1 torchrun --nproc-per-node=4 distributed/test_same_node.py
- TARGET_TEST_SUITE=L4 pytest -v -s distributed/test_basic_distributed_correctness.py
- pytest -v -s distributed/test_basic_distributed_correctness_enc_dec.py
- pytest -v -s distributed/test_chunked_prefill_distributed.py
- pytest -v -s distributed/test_multimodal_broadcast.py
- pytest -v -s spec_decode/e2e/test_integration_dist_tp2.py
Expand Down
99 changes: 99 additions & 0 deletions examples/offline_inference_encoder_decoder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
'''
Demonstrate prompting of text-to-text
encoder/decoder models, specifically BART
'''

from vllm import LLM, SamplingParams
from vllm.inputs import ExplicitEncoderDecoderPrompt, TextPrompt, TokensPrompt
from vllm.utils import zip_enc_dec_prompt_lists

dtype = "float"

# Create a BART encoder/decoder model instance
llm = LLM(
model="facebook/bart-large-cnn",
dtype=dtype,
)

# Get BART tokenizer
tokenizer = llm.llm_engine.get_tokenizer_group()

# Test prompts
#
# This section shows all of the valid ways to prompt an
# encoder/decoder model.
#
# - Helpers for building prompts
text_prompt_raw = "Hello, my name is"
text_prompt = TextPrompt(prompt="The president of the United States is")
tokens_prompt = TokensPrompt(
prompt_token_ids=tokenizer.encode(prompt="The capital of France is", ))
# - Pass a single prompt to encoder/decoder model
# (implicitly encoder input prompt);
# decoder input prompt is assumed to be None

single_text_prompt_raw = text_prompt_raw # Pass a string directly
single_text_prompt = text_prompt # Pass a TextPrompt
single_tokens_prompt = tokens_prompt # Pass a TokensPrompt

# - Pass explicit encoder and decoder input prompts within one data structure.
# Encoder and decoder prompts can both independently be text or tokens, with
# no requirement that they be the same prompt type. Some example prompt-type
# combinations are shown below, note that these are not exhaustive.

enc_dec_prompt1 = ExplicitEncoderDecoderPrompt(
# Pass encoder prompt string directly, &
# pass decoder prompt tokens
encoder_prompt=single_text_prompt_raw,
decoder_prompt=single_tokens_prompt,
)
enc_dec_prompt2 = ExplicitEncoderDecoderPrompt(
# Pass TextPrompt to encoder, and
# pass decoder prompt string directly
encoder_prompt=single_text_prompt,
decoder_prompt=single_text_prompt_raw,
)
enc_dec_prompt3 = ExplicitEncoderDecoderPrompt(
# Pass encoder prompt tokens directly, and
# pass TextPrompt to decoder
encoder_prompt=single_tokens_prompt,
decoder_prompt=single_text_prompt,
)

# - Finally, here's a useful helper function for zipping encoder and
# decoder prompt lists together into a list of ExplicitEncoderDecoderPrompt
# instances
zipped_prompt_list = zip_enc_dec_prompt_lists(
['An encoder prompt', 'Another encoder prompt'],
['A decoder prompt', 'Another decoder prompt'])

# - Let's put all of the above example prompts together into one list
# which we will pass to the encoder/decoder LLM.
prompts = [
single_text_prompt_raw, single_text_prompt, single_tokens_prompt,
enc_dec_prompt1, enc_dec_prompt2, enc_dec_prompt3
] + zipped_prompt_list

print(prompts)

# Create a sampling params object.
sampling_params = SamplingParams(
temperature=0,
top_p=1.0,
min_tokens=0,
max_tokens=20,
)

# Generate output tokens from the prompts. The output is a list of
# RequestOutput objects that contain the prompt, generated
# text, and other information.
outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
prompt = output.prompt
encoder_prompt = output.encoder_prompt
generated_text = output.outputs[0].text
print(f"Encoder prompt: {encoder_prompt!r}, "
f"Decoder prompt: {prompt!r}, "
f"Generated text: {generated_text!r}")
Loading
Loading