Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TensorRT-LLM #2156

Merged
merged 2 commits into from
Aug 27, 2024
Merged

Update TensorRT-LLM #2156

merged 2 commits into from
Aug 27, 2024

Conversation

Shixiaowei02
Copy link
Collaborator

@Shixiaowei02 Shixiaowei02 commented Aug 27, 2024

  • Model Support
  • Features
    • Add redrafter support for curand and bfloat16
    • Introduce sparse mixer normalization mode for MOE models
    • Add support for QKV scaling in FP8 FMHA
  • API
    • [BREAKING CHANGE] Update LLM.generate arguments to include PromptInputs and tqdm
    • [BREAKING CHANGE] Remove context_fmha=disable flag when use_paged_context_fmha is enabled
  • Bug fixes
  • Memory optimization
  • Benchmark
    • Add acceptance rate for gptManagerBenchmark
  • Performance
  • Infra
  • Documentation

@Shixiaowei02 Shixiaowei02 requested a review from byshiue August 27, 2024 09:13
Copy link
Collaborator

@byshiue byshiue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Shixiaowei02 Shixiaowei02 merged commit b8fc663 into main Aug 27, 2024
@Shixiaowei02 Shixiaowei02 deleted the preview/main branch August 27, 2024 10:21
@dwahaa
Copy link

dwahaa commented Aug 29, 2024

@Shixiaowei02
“TensorRT-LLM/examples/qwen/requirements.txt” need tensorrt_llm==0.13.0.dev2024082700, so where is the branch 0.13.0??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants