-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Deepseek-v3 running on 2xH100 nodes getting poor performanc
bug
Something isn't working
#2786
opened Feb 14, 2025 by
zymy-chen
2 of 4 tasks
The performance of Qwen1.5-7B based on the trtllm-bench test was very poor
bug
Something isn't working
#2785
opened Feb 14, 2025 by
ruru5697
3 of 4 tasks
Bug when loading an engine using LoRA through LLM API
bug
Something isn't working
Investigating
LLM API/Workflow
triaged
Issue has been triaged by maintainers
#2782
opened Feb 13, 2025 by
pei0033
2 of 4 tasks
GPU Utilization drops gradually over time using Executor API
bug
Something isn't working
#2778
opened Feb 12, 2025 by
MahmoudAshraf97
3 of 4 tasks
Inconsistent Batch Index Order in Decoupled Mode with trt-llm
bug
Something isn't working
#2777
opened Feb 12, 2025 by
Oldpan
2 of 4 tasks
DeepSeek-V3 fp8 tp32 failed to convert chectpoint
bug
Something isn't working
#2776
opened Feb 12, 2025 by
MtFitzRoy
2 of 4 tasks
Processing multi concurrent request by Qwen2-VL is slow. It seems infer in queue.
#2775
opened Feb 12, 2025 by
zhaocc1106
Installation broken with Something isn't working
0.17.0.post1
with poetry due to git / flash infer dependency.
bug
#2774
opened Feb 12, 2025 by
michaelfeil
1 of 4 tasks
Cannot create checkpoint for llama-3.2 (1B, 3B)
bug
Something isn't working
#2772
opened Feb 11, 2025 by
falkbene
3 of 4 tasks
TypeError: quantize_and_export() got an unexpected keyword argument 'cp_size'
#2771
opened Feb 11, 2025 by
yanduoduan
CUDA Illegal memory access for certain input sizes to Whisper
bug
Something isn't working
#2767
opened Feb 9, 2025 by
MahmoudAshraf97
2 of 4 tasks
Are there any plans to implement DualPipe parallelism from DeepSeek
#2765
opened Feb 7, 2025 by
ttim
Building from source does not work
bug
Something isn't working
Investigating
triaged
Issue has been triaged by maintainers
#2757
opened Feb 6, 2025 by
maximzubkov
2 of 4 tasks
Experimental pytorch workflow
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2750
opened Feb 6, 2025 by
ttim
KV Cache quantization is not working with Whisper
bug
Something isn't working
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2748
opened Feb 5, 2025 by
MahmoudAshraf97
3 of 4 tasks
fp8_rowwise and kv cache
question
Further information is requested
triaged
Issue has been triaged by maintainers
#2740
opened Feb 3, 2025 by
vitipunto
DeepSeek-R1-Distill-Llama-70B int4 quantized version of the model generates garbage values
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#2735
opened Feb 1, 2025 by
kelkarn
2 of 4 tasks
Lora error while building tensorrt llm engine for mllama
bug
Something isn't working
Investigating
Lora/P-tuning
triaged
Issue has been triaged by maintainers
#2733
opened Feb 1, 2025 by
nbowon
2 of 4 tasks
Previous Next
ProTip!
Follow long discussions with comments:>50.