T5 enc/dec example file; linting/formatting #1

afeldman-nm · 2024-03-01T20:38:35Z

SUMMARY
This PR is mainly to set up a process whereby I may PR from my vLLM fork into your fork. This PR also (1) accomplishes linting and formatting using ./format.sh and (2) moves your test.py file into examples/ and organizes it to look like other examples

TESTING
When this PR is finished, examples/offline_inference_enc_dec.py should compare T5 inference results between vLLM and native PyTorch execution, allowing the particular T5 variant and the datatype to be customized. This should help with validating correctness of the vLLM T5 integration as well as debugging the NaN case for T5-large.

…llm-project#3099)

Co-authored-by: br3no <breno@veltefaria.de> Co-authored-by: simon-mo <simon.mo@hey.com>

…2586) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

Co-authored-by: huohuarong <huohuarong@zuoshouyisheng.com>

Co-authored-by: Robert Shaw <114415538+rib-2@users.noreply.github.com> Co-authored-by: alexm <alexm@neuralmagic.com>

afeldman-nm · 2024-03-01T21:44:40Z

Hello, quick update: as of the last commit, the test at examples/offline_inference_enc_dec.py now yields a comparison between vLLM and native PyTorch completion results. The model size and dtype may be customized for a given run of the example script.

I ran the example script for the following test cases:

model = ['t5-small','t5-large']
dtype = ['float16','bfloat16','float32'] # There is no bfloat32

The results are below. In summary, vLLM T5 completions consistently match native PyTorch, except for the t5-large/float16 case where only vLLM yields NaNs that result in an empty completion. I am not concerned about this for now, as I suspect T5 was optimized for FP32.

t5-small, float16:

Prompt: 'Who are you?', Native PyTorch generated text: 'Wer bist du?', vLLM generated text: ' Wer bist du?'
Prompt: 'Who are you?', Native PyTorch generated text: 'Wer bist du?', vLLM generated text: ' Wer bist du?'
Prompt: 'How do you like your egg made', Native PyTorch generated text: 'Wie m aime o egg made', vLLM generated text: ' Wie m aime o egg made'
Prompt: 'How do you like your egg made', Native PyTorch generated text: 'Wie m aime o egg made', vLLM generated text: ' Wie m aime o egg made'

t5-small, bfloat16:

Prompt: 'Who are you?', Native PyTorch generated text: 'Wer bist du?', vLLM generated text: ' Wer bist du?'
Prompt: 'Who are you?', Native PyTorch generated text: 'Wer bist du?', vLLM generated text: ' Wer bist du?'
Prompt: 'How do you like your egg made', Native PyTorch generated text: 'Wie m aime o egg made', vLLM generated text: ' Wie m aime o egg made'
Prompt: 'How do you like your egg made', Native PyTorch generated text: 'Wie m aime o egg made', vLLM generated text: ' Wie m aime o egg made'

t5-small, float32:

Prompt: 'Who are you?', Native PyTorch generated text: 'Wer bist du?', vLLM generated text: ' Wer bist du?'
Prompt: 'Who are you?', Native PyTorch generated text: 'Wer bist du?', vLLM generated text: ' Wer bist du?'
Prompt: 'How do you like your egg made', Native PyTorch generated text: 'Wie m aime o egg made', vLLM generated text: ' Wie m aime o egg made'
Prompt: 'How do you like your egg made', Native PyTorch generated text: 'Wie m aime o egg made', vLLM generated text: ' Wie m aime o egg made'

t5-large, float16:

Note: for the vLLM tests in this scenario, intermediate results of the inference process became NaN, which led to the vLLM output being an empty string. This was not the case for the native PyTorch output.

Prompt: 'Who are you?', Native PyTorch generated text: 'Who are you?', vLLM generated text: ''
Prompt: 'Who are you?', Native PyTorch generated text: 'Who are you?', vLLM generated text: ''
Prompt: 'How do you like your egg made', Native PyTorch generated text: '? How do you like your egg made?', vLLM generated text: ''
Prompt: 'How do you like your egg made', Native PyTorch generated text: '? How do you like your egg made?', vLLM generated text: ''

t5-large, bfloat16:

Prompt: 'Who are you?', Native PyTorch generated text: 'Who are you?', vLLM generated text: ' Who are you?'
Prompt: 'Who are you?', Native PyTorch generated text: 'Who are you?', vLLM generated text: ' Who are you?'
Prompt: 'How do you like your egg made', Native PyTorch generated text: '? How do you like your egg made?', vLLM generated text: '? How do you like your egg made?'
Prompt: 'How do you like your egg made', Native PyTorch generated text: '? How do you like your egg made?', vLLM generated text: '? How do you like your egg made?'

t5-large, float32:

Prompt: 'Who are you?', Native PyTorch generated text: 'Who are you?', vLLM generated text: ' Who are you?'
Prompt: 'Who are you?', Native PyTorch generated text: 'Who are you?', vLLM generated text: ' Who are you?'
Prompt: 'How do you like your egg made', Native PyTorch generated text: '? How do you like your egg made?', vLLM generated text: '? How do you like your egg made?'
Prompt: 'How do you like your egg made', Native PyTorch generated text: '? How do you like your egg made?', vLLM generated text: '? How do you like your egg made?'

afeldman-nm · 2024-03-01T21:59:06Z

Also, I just went in and resolved the remaining conflicts with vllm upstream main

js8544 · 2024-03-02T04:59:44Z

+1 Thanks!

chu-tianxiang and others added 15 commits February 28, 2024 21:52

Add Support for 2/3/8-bit GPTQ Quantization Models (vllm-project#2330)

01a5d18

Fix: AttributeError in OpenAI-compatible server (vllm-project#3018)

a6d471c

add cache_config's info to prometheus metrics. (vllm-project#3100)

9289e57

Support starcoder2 architecture (vllm-project#3089)

bfdcfa6

Fix building from source on WSL (vllm-project#3112)

2c08ff2

[Fix] Don't deep-copy LogitsProcessors when copying SamplingParams (v…

29a8d6a

…llm-project#3099)

Add guided decoding for OpenAI API server (vllm-project#2819)

703e42e

Co-authored-by: br3no <breno@veltefaria.de> Co-authored-by: simon-mo <simon.mo@hey.com>

Fix: Output text is always truncated in some models (vllm-project#3016)

54d3544

Remove exclude_unset in streaming response (vllm-project#3143)

27ca23d

docs: Add tutorial on deploying vLLM model with KServe (vllm-project#…

49d849b

…2586) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

fix relative import path of protocol.py (vllm-project#3134)

90fbf12

Co-authored-by: huohuarong <huohuarong@zuoshouyisheng.com>

T5 enc/dec example file; linting/formatting

be58c3b

Integrate Marlin Kernels for Int4 GPTQ inference (vllm-project#2497)

c0c2335

Co-authored-by: Robert Shaw <114415538+rib-2@users.noreply.github.com> Co-authored-by: alexm <alexm@neuralmagic.com>

Bump up to v0.3.3 (vllm-project#3129)

82091b8

native/vllm t5 comparison test

70837fd

merged upstream-main into enc_dec_t5

42a6e2b

afeldman-nm mentioned this pull request Mar 1, 2024

Add Encoder-decoder model support and T5 Model support vllm-project/vllm#3117

Closed

js8544 merged commit db726e6 into js8544:enc_dec_t5 Mar 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T5 enc/dec example file; linting/formatting #1

T5 enc/dec example file; linting/formatting #1

afeldman-nm commented Mar 1, 2024

afeldman-nm commented Mar 1, 2024

afeldman-nm commented Mar 1, 2024

js8544 commented Mar 2, 2024

T5 enc/dec example file; linting/formatting #1

T5 enc/dec example file; linting/formatting #1

Conversation

afeldman-nm commented Mar 1, 2024

afeldman-nm commented Mar 1, 2024

afeldman-nm commented Mar 1, 2024

js8544 commented Mar 2, 2024