Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TensorRT-LLM #1793

Merged
merged 1 commit into from
Jun 18, 2024
Merged

Update TensorRT-LLM #1793

merged 1 commit into from
Jun 18, 2024

Conversation

Shixiaowei02
Copy link
Collaborator

  • Model Support
    • Support Qwen1.5 MoE A2.7B
    • Support Phi 3 vision multimodal
  • Features
    • Encoder-Decoder C++ Runtime TP Support
    • Explicit draft tokens inflight batching
    • Support local file for calibration
    • Add batched logits post processor
    • Add Hopper qgmma kernel to XQA JIT codepath
    • MoE enable TP+EP
    • Add lookahead decoding layer
  • API
    • [BREAKING CHANGE] Setup buffers for explicit draft tokens decoding
    • [BREAKING CHANGE] Replace all occurrences of max_output_len with max_seq_len
      • This involves trtllm-build and benchmark related parameters
    • [BREAKING CHANGE] Remove GptSession Python bindings
    • [BREAKING CHANGE] Add runtime max batch size to gptManagerBenchmark
    • Support remaining executor API options in HLAPI
    • Support get_stats and aget_stats in HL Executor while using multi-gpu
    • Add iterLatencyMilliSec to stats and iteration log
  • Bug fixes
  • Memory optimization
    • Support stream reader to reduce peak memory when using weight streaming
  • Benchmark
  • Performance
    • Optimize the build time when XQA JIT is enabled
    • Reduce number of stream when using fused decoder
  • Infra
  • Documentation
    • Update documents about GEMM plugins
    • Polish enc-dec readme to reflect recent changes
    • Update Mixtral example docs to include Mixtral-8x22B instructions
    • Simplify recurrent gemma README

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants