Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TensorRT-LLM #2297

Merged
merged 1 commit into from
Oct 8, 2024
Merged

Update TensorRT-LLM #2297

merged 1 commit into from
Oct 8, 2024

Conversation

kaiyux
Copy link
Member

@kaiyux kaiyux commented Oct 8, 2024

  • Features
    • ReDrafter beam search logic is updated to match Apple's ReDrafter v1.1.
    • Draft-Target speculative decoding now can be done natively with just TensorRT-LLM. The driver code is located in examples/run.py and documentation is in examples/draft_target_model/README.md.
    • NVIDIA Volta GPU support is deprecated and will be removed in a future release.
  • API
    • Add logits processor support to the ModelRunnerCpp class.
    • Added isParticipant method to the C++ Executor API to check if the current process is a participant in the executor instance.
    • [BREAKING CHANGE] Remove builder_opt from build_config and trtllm-build command.
  • Bug fixes
  • Performance
    • Improved customAllReduce performance by using Lamport-style AllReduce + Norm fusion.
    • Set static input tensors once at the beginning instead of each iteration. (This should be especially noticeable for RNN based models because the RNN state pointers are currently separate for each layer.)
    • Draft model now can trigger device memcpy over MPI to the target model's process in orchestrator mode. This reduces the latency between the end of the draft model generation and beginning of target inference.

Copy link
Collaborator

@DanBlanaru DanBlanaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@DanBlanaru DanBlanaru merged commit 8681b3a into main Oct 8, 2024
@DanBlanaru DanBlanaru deleted the preview/main branch October 8, 2024 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants