-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
DeepSeek-R1-Distill-Llama-70B int4 quantized version of the model generates garbage values
bug
Something isn't working
#2735
opened Feb 1, 2025 by
kelkarn
2 of 4 tasks
Lora error while building tensorrt llm engine for mllama
bug
Something isn't working
#2733
opened Feb 1, 2025 by
nbowon
2 of 4 tasks
AttributeError in tensorrt_llm.logger: No attribute 'info' on TensorRT-LLM quantization script
bug
Something isn't working
#2730
opened Jan 31, 2025 by
ValeGian
2 of 4 tasks
Unable to Install tensorrt_llm 17 Due to Flashinfer Git Clone Failure
bug
Something isn't working
#2729
opened Jan 31, 2025 by
ValeGian
2 of 4 tasks
"Trying to remove block n by 0 that is not in hash map" spam in release 0.17
bug
Something isn't working
#2727
opened Jan 31, 2025 by
aikitoria
2 of 4 tasks
[TensorRT-LLM][ERROR] Encountered an error in forwardSync function: cannot create std::vector larger than max_size()
bug
Something isn't working
#2723
opened Jan 29, 2025 by
MahmoudAshraf97
2 of 4 tasks
Whisper example not returning transcription in orig language
bug
Something isn't working
#2721
opened Jan 27, 2025 by
haiderasad
2 of 4 tasks
When to expect new development versions
triaged
Issue has been triaged by maintainers
#2720
opened Jan 24, 2025 by
ttim
How can i quantize model when i use custom model on tensorrt-llm?whether i need to write c++ code or not?any examples?thank u for your time and help.
triaged
Issue has been triaged by maintainers
#2718
opened Jan 24, 2025 by
DelongYang666
Input length limitation (8192) despite model supporting 32k context window
#2717
opened Jan 24, 2025 by
HuangZhen02
Are multimodal models supported by trtllm-serve?
OpenAI API
triaged
Issue has been triaged by maintainers
#2714
opened Jan 23, 2025 by
xiaoyuzju
how to compile deepseekv3 ?
Installation
triaged
Issue has been triaged by maintainers
#2711
opened Jan 22, 2025 by
zmtttt
Support for Blackwell and Thor
triaged
Issue has been triaged by maintainers
#2710
opened Jan 21, 2025 by
phantaurus
Speculative Decoding - Draft Target model approach - Having issue with Triton inference Server
Investigating
triaged
Issue has been triaged by maintainers
Triton Backend
#2709
opened Jan 21, 2025 by
sivabreddy
[bug] Encountered an error in forwardAsync function: Assertion failed: mNextBlocks.empty()
Generic Runtime
Investigating
triaged
Issue has been triaged by maintainers
#2708
opened Jan 21, 2025 by
akhoroshev
convert NVILA with 0.16.0
bug
Something isn't working
Investigating
LLM API/Workflow
triaged
Issue has been triaged by maintainers
#2706
opened Jan 20, 2025 by
dzy130120
2 of 4 tasks
trt-llm相比hf跑qwen的forward仅context phrase有加速效果,generation没有加速效果
bug
Something isn't working
#2705
opened Jan 20, 2025 by
nickole2018
2 of 4 tasks
Support for int2/int3 quantization
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2704
opened Jan 20, 2025 by
ZHITENGLI
quantized model using AWQ and lora weights
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2703
opened Jan 17, 2025 by
shuyuan-wang
Wrong outputs with FP8 kv_cache reuse
bug
Something isn't working
Investigating
KV-Cache Management
triaged
Issue has been triaged by maintainers
#2699
opened Jan 16, 2025 by
lishicheng1996
2 of 4 tasks
Previous Next
ProTip!
What’s not been updated in a month: updated:<2025-01-03.