Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Can't enable access between nodes 1 and 0 #2066

Closed
EASTERNTIGER opened this issue Jul 31, 2024 · 4 comments
Closed

RuntimeError: Can't enable access between nodes 1 and 0 #2066

EASTERNTIGER opened this issue Jul 31, 2024 · 4 comments
Assignees
Labels
not a bug Some known limitation, but not a bug. triaged Issue has been triaged by maintainers waiting for feedback

Comments

@EASTERNTIGER
Copy link

EASTERNTIGER commented Jul 31, 2024

Hi, I tried to convert T5 model to tensorrt. I have a 4 GPUs devices.In the python convert_checkpoint.py step,I set tp_size=4,pp_size=1.Then I got tensorrt model successfully.However,when I use command :mpirun --allow-run-as-root -np 4 python3 run.py ,I got those errors

image
when I set tp_size=1,pp_size=1 in the python convert_checkpoint.py step,I can run python3 run.py successfully.
So how can I fixed this problem?It seems to be related with GPU setting,but I don't know how to do that.
I also found a similar issue
image
but when I added --use_custom_all_reduce disable in trtllm-build,it showed unrecognized arguments
image

@OptimusV5
Copy link

same problem,seems that this argument has been removed #2008

@Kefeng-Duan
Copy link
Collaborator

@yuxianq
Copy link

yuxianq commented Aug 21, 2024

@EASTERNTIGER @OptimusV5 This bug is known and has been fixed in both the main branch and v0.12, you can validate it with the main branch now or wait for the v0.12 release.

@Kefeng-Duan
Copy link
Collaborator

@lfr-0531 lfr-0531 added the triaged Issue has been triaged by maintainers label Sep 2, 2024
@Kefeng-Duan Kefeng-Duan added the not a bug Some known limitation, but not a bug. label Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not a bug Some known limitation, but not a bug. triaged Issue has been triaged by maintainers waiting for feedback
Projects
None yet
Development

No branches or pull requests

6 participants