-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T5 enc/dec example file; linting/formatting #1
Conversation
Co-authored-by: br3no <breno@veltefaria.de> Co-authored-by: simon-mo <simon.mo@hey.com>
…2586) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: huohuarong <huohuarong@zuoshouyisheng.com>
Co-authored-by: Robert Shaw <114415538+rib-2@users.noreply.github.com> Co-authored-by: alexm <alexm@neuralmagic.com>
Hello, quick update: as of the last commit, the test at examples/offline_inference_enc_dec.py now yields a comparison between vLLM and native PyTorch completion results. The model size and dtype may be customized for a given run of the example script. I ran the example script for the following test cases:
The results are below. In summary, vLLM T5 completions consistently match native PyTorch, except for the t5-large/float16 case where only vLLM yields NaNs that result in an empty completion. I am not concerned about this for now, as I suspect T5 was optimized for FP32. t5-small, float16: Prompt: 'Who are you?', Native PyTorch generated text: 'Wer bist du?', vLLM generated text: ' Wer bist du?' t5-small, bfloat16: Prompt: 'Who are you?', Native PyTorch generated text: 'Wer bist du?', vLLM generated text: ' Wer bist du?' t5-small, float32: Prompt: 'Who are you?', Native PyTorch generated text: 'Wer bist du?', vLLM generated text: ' Wer bist du?' t5-large, float16: Note: for the vLLM tests in this scenario, intermediate results of the inference process became NaN, which led to the vLLM output being an empty string. This was not the case for the native PyTorch output. Prompt: 'Who are you?', Native PyTorch generated text: 'Who are you?', vLLM generated text: '' t5-large, bfloat16: Prompt: 'Who are you?', Native PyTorch generated text: 'Who are you?', vLLM generated text: ' Who are you?' t5-large, float32: Prompt: 'Who are you?', Native PyTorch generated text: 'Who are you?', vLLM generated text: ' Who are you?' |
Also, I just went in and resolved the remaining conflicts with vllm upstream main |
+1 Thanks! |
SUMMARY
This PR is mainly to set up a process whereby I may PR from my vLLM fork into your fork. This PR also (1) accomplishes linting and formatting using ./format.sh and (2) moves your test.py file into examples/ and organizes it to look like other examples
TESTING
When this PR is finished, examples/offline_inference_enc_dec.py should compare T5 inference results between vLLM and native PyTorch execution, allowing the particular T5 variant and the datatype to be customized. This should help with validating correctness of the vLLM T5 integration as well as debugging the NaN case for T5-large.