-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update TensorRT-LLM #667
Update TensorRT-LLM #667
Conversation
TLLM_CHECK(mTensorParallelism > 0); | ||
TLLM_CHECK(mPipelineParallelism > 0); | ||
|
||
TLLM_CHECK_WITH_INFO(static_cast<SizeType>(numDevices) >= tensorParallelism * pipelineParallelism, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a mistake here? In a multi-node setup, the product of mGpusPerNode
and the number of nodes should exceed the total of TP
multiplied by PP
. However, in this case, the number of GPUs per node is actually less than the product of TP
and PP
@kaiyux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the sharp catch. I will discuss with the TensorRT-LLM engineers working on C++ runtime and go back to you later.
June
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, is there any progress on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @leavelet , we did not fully test multi-node support in TensorRT-LLM, it might be working, but there is no guarantee. If you need to try that, we suggest that you remove the check locally so that it won't block you. Thanks very much for your interest on our work!
multi_block_mode
when certain conditions are met (large TP & 32K sequence length)