Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: multi gpu gpt not stop generate when end_id #487

Closed
wants to merge 2 commits into from

Conversation

xbugliu
Copy link

@xbugliu xbugliu commented Mar 10, 2023

With the Multi GPU GPT model, the generate is not stopped immediately when the end_id is hit, and it does not stop until output_seq_len.

@byshiue
Copy link
Collaborator

byshiue commented Mar 13, 2023

Hi, xbugliu. Thank you for the feedback. We are fixing the issue now and the reason is not only here. We will update the fixing ASAP.

@hongqing1986
Copy link

hongqing1986 commented Mar 16, 2023

t

With One Gpu, GPT model also not stopped immediately when the end_id is hit
Merging is one the way, good job!

@byshiue
Copy link
Collaborator

byshiue commented Mar 24, 2023

Hi. This fix would lead to hang on on pipeline parallelism, so we cannot merge it into main branch.
Temporarily fixing this issue in this branch https://github.com/NVIDIA/FasterTransformer/tree/tmp/fix_gpt_earlystop.

@byshiue byshiue closed this Mar 24, 2023
@xbugliu
Copy link
Author

xbugliu commented Mar 25, 2023

ok

@Lzhang-hub
Copy link

t

With One Gpu, GPT model also not stopped immediately when the end_id is hit Merging is one the way, good job!

@hongqing1986 I deploy a BLOOM model with one GPU, it not stopped immediately when the end_id is hit, and I build the image triton_with_ft with https://github.com/xbugliu/FasterTransformer.git, but it still not work well,
I would really appreciate it if you could describe how you use.

@byshiue
Copy link
Collaborator

byshiue commented Apr 27, 2023

@Lzhang-hub
Copy link

Yes, I also tried this branch.

I changed the Fastertransformer rep in the file CMakelists.txt in https://github.com/triton-inference-server/fastertransformer_backend rep, like :
image

and build the triton_with_ft image whith

docker build --rm   \
    --build-arg TRITON_VERSION=${CONTAINER_VERSION}   \
    -t ${TRITON_DOCKER_IMAGE} \
    -f docker/Dockerfile \
    .

@byshiue
Copy link
Collaborator

byshiue commented Apr 27, 2023

Yes, I also tried this branch.

I changed the Fastertransformer rep in the file CMakelists.txt in https://github.com/triton-inference-server/fastertransformer_backend rep, like : image

and build the triton_with_ft image whith

docker build --rm   \
    --build-arg TRITON_VERSION=${CONTAINER_VERSION}   \
    -t ${TRITON_DOCKER_IMAGE} \
    -f docker/Dockerfile \
    .

Let's focus on FT's c example first. Can you share how to reproduce the issue on FT c example? Or you don't encounter such issue on FT, but only encounter on backend?

@Lzhang-hub
Copy link

Thank you very much, at present, I encountered in the issue of using the backend, I have not tried FT c example, I will try to the FT c examplenext, and provide feedback.

@byshiue
Copy link
Collaborator

byshiue commented May 1, 2023

The issue is fixed in the MR #584 and merge in main branch. Sorry for the late fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants