Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TensorRT-LLM #667

Merged
merged 2 commits into from
Dec 15, 2023
Merged

Update TensorRT-LLM #667

merged 2 commits into from
Dec 15, 2023

Conversation

kaiyux
Copy link
Member

@kaiyux kaiyux commented Dec 15, 2023

  • Model Support
    • BART and mBART support in encoder-decoder models
    • Support FairSeq Neural Machine Translation (NMT) family
    • Mixtral-8x7B model support
      • Support weight loading for HuggingFace Mixtral model
  • Features
    • MPT - Int4 AWQ / SmoothQuant support
    • Support speculative decoding with prefilled KV cache
    • Support AWQ and GPTQ support for QWEN
    • Support ReduceScatter plugin
  • Bug fixes
  • Performance
    • Optimize Hopper warp specialized kernels
    • Optimize AllReduce for parallel attention on Falcon and GPT-J
    • Enable split-k for weight-only cutlass kernel when SM>=75

TLLM_CHECK(mTensorParallelism > 0);
TLLM_CHECK(mPipelineParallelism > 0);

TLLM_CHECK_WITH_INFO(static_cast<SizeType>(numDevices) >= tensorParallelism * pipelineParallelism,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a mistake here? In a multi-node setup, the product of mGpusPerNode and the number of nodes should exceed the total of TP multiplied by PP. However, in this case, the number of GPUs per node is actually less than the product of TP and PP
@kaiyux

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the sharp catch. I will discuss with the TensorRT-LLM engineers working on C++ runtime and go back to you later.

June

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, is there any progress on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @leavelet , we did not fully test multi-node support in TensorRT-LLM, it might be working, but there is no guarantee. If you need to try that, we suggest that you remove the check locally so that it won't block you. Thanks very much for your interest on our work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants