Update TensorRT-LLM #2156

Shixiaowei02 · 2024-08-27T09:13:50Z

Model Support
Features
- Add redrafter support for curand and bfloat16
- Introduce sparse mixer normalization mode for MOE models
- Add support for QKV scaling in FP8 FMHA
API
- [BREAKING CHANGE] Update LLM.generate arguments to include PromptInputs and tqdm
- [BREAKING CHANGE] Remove context_fmha=disable flag when use_paged_context_fmha is enabled
Bug fixes
- Fix issue where temperature < 0.001 resulted in garbage output
- Avoid unnecessary conversion of logging arguments
- Fix transient workerPool hang
- Fix default factory for LoraConfig (in python 3.11 and release 0.8.0: ValueError: mutable default <class 'datasets.utils.version.Version'> for field version is not allowed: use default_factory #1323)
Memory optimization
Benchmark
- Add acceptance rate for gptManagerBenchmark
Performance
Infra
Documentation

byshiue

LGTM

dwahaa · 2024-08-29T06:44:27Z

@Shixiaowei02
“TensorRT-LLM/examples/qwen/requirements.txt” need tensorrt_llm==0.13.0.dev2024082700， so where is the branch 0.13.0??

open source d461cf4c2e17afd6a88d2a63e85823dec7a3aab1

4056248

Shixiaowei02 requested a review from byshiue August 27, 2024 09:13

byshiue approved these changes Aug 27, 2024

View reviewed changes

update the news

956ec5c

Shixiaowei02 merged commit b8fc663 into main Aug 27, 2024

Shixiaowei02 deleted the preview/main branch August 27, 2024 10:21

Provide feedback