-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: multi gpu gpt not stop generate when end_id #487
Conversation
Hi, xbugliu. Thank you for the feedback. We are fixing the issue now and the reason is not only here. We will update the fixing ASAP. |
With One Gpu, GPT model also not stopped immediately when the end_id is hit |
Hi. This fix would lead to hang on on pipeline parallelism, so we cannot merge it into main branch. |
ok |
@hongqing1986 I deploy a BLOOM model with one GPU, it not stopped immediately when the end_id is hit, and I build the image |
Do you use the branch https://github.com/NVIDIA/FasterTransformer/tree/tmp/fix_gpt_earlystop? |
Yes, I also tried this branch. I changed the Fastertransformer rep in the file and build the triton_with_ft image whith
|
Let's focus on FT's c example first. Can you share how to reproduce the issue on FT c example? Or you don't encounter such issue on FT, but only encounter on backend? |
Thank you very much, at present, I encountered in the issue of using the backend, I have not tried FT c example, I will try to the FT c examplenext, and provide feedback. |
The issue is fixed in the MR #584 and merge in main branch. Sorry for the late fixing. |
With the Multi GPU GPT model, the generate is not stopped immediately when the end_id is hit, and it does not stop until output_seq_len.