Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT-LLM v0.10 update #1734

Merged
merged 1 commit into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
*.a filter=lfs diff=lfs merge=lfs -text
*.lib filter=lfs diff=lfs merge=lfs -text
*.so filter=lfs diff=lfs merge=lfs -text
*.dll filter=lfs diff=lfs merge=lfs -text
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ body:
- Libraries
- TensorRT-LLM branch or tag (e.g., main, v0.7.1)
- TensorRT-LLM commit (if known)
- Versions of TensorRT, AMMO, CUDA, cuBLAS, etc. used
- Versions of TensorRT, Modelopt, CUDA, cuBLAS, etc. used
- Container used (if running TensorRT-LLM in a container)
- NVIDIA driver version
- OS (Ubuntu 22.04, CentOS 7, Windows 10)
Expand Down
25 changes: 25 additions & 0 deletions .github/workflows/auto_close_inactive_issues.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Ref: https://docs.github.com/en/actions/managing-issues-and-pull-requests/closing-inactive-issues
name: Close inactive issues
on:
schedule:
- cron: "30 1 * * *"

jobs:
stale:
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
steps:
- uses: actions/stale@v9
with:
days-before-issue-stale: 30
days-before-issue-close: 15
stale-issue-label: "stale"
exempt-issue-labels: ""
stale-issue-message: This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
close-issue-message: "This issue was closed because it has been stalled for 15 days with no activity."
days-before-pr-stale: -1
days-before-pr-close: -1
repo-token: ${{ secrets.GITHUB_TOKEN }}
debug-only: false
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ __pycache__/
*.nsys-rep
.VSCodeCounter
build*/
*.so
*.egg-info/
.coverage
*.csv
Expand All @@ -29,11 +28,14 @@ config.json
/*.svg
cpp/cmake-build-*
cpp/.ccache/
tensorrt_llm/bin
tensorrt_llm/libs
tensorrt_llm/bindings.*.so
tensorrt_llm/bindings.pyi
tensorrt_llm/bindings/*.pyi
*docs/cpp_docs*
*docs/source/_cpp_gen*
*.swp

# Testing
.coverage.*
Expand Down
2 changes: 1 addition & 1 deletion 3rdparty/cutlass
Submodule cutlass updated 548 files
210 changes: 0 additions & 210 deletions CHANGELOG.md

This file was deleted.

20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ TensorRT-LLM

[![Documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](https://nvidia.github.io/TensorRT-LLM/)
[![python](https://img.shields.io/badge/python-3.10.12-green)](https://www.python.org/downloads/release/python-31012/)
[![cuda](https://img.shields.io/badge/cuda-12.3-green)](https://developer.nvidia.com/cuda-downloads)
[![trt](https://img.shields.io/badge/TRT-9.3-green)](https://developer.nvidia.com/tensorrt)
[![version](https://img.shields.io/badge/release-0.9.0-green)](./setup.py)
[![cuda](https://img.shields.io/badge/cuda-12.4.0-green)](https://developer.nvidia.com/cuda-downloads)
[![trt](https://img.shields.io/badge/TRT-10.0.1-green)](https://developer.nvidia.com/tensorrt)
[![version](https://img.shields.io/badge/release-0.10.0.dev-green)](./setup.py)
[![license](https://img.shields.io/badge/license-Apache%202-blue)](./LICENSE)

[Architecture](./docs/source/architecture/overview.md)   |   [Results](./docs/source/performance/perf-overview.md)   |   [Examples](./examples/)   |   [Documentation](./docs/source/)
Expand Down Expand Up @@ -57,11 +57,11 @@ like `GPTAttention` or `BertAttention`, can be found in the
[models](./tensorrt_llm/models) module.

TensorRT-LLM comes with several popular models pre-defined. They can easily be
modified and extended to fit custom needs. Refer to the [Support Matrix](docs/source/reference/support-matrix.md) for a list of supported models.
modified and extended to fit custom needs. Refer to the [Support Matrix](https://nvidia.github.io/TensorRT-LLM/reference/support-matrix.html) for a list of supported models.

To maximize performance and reduce memory footprint, TensorRT-LLM allows the
models to be executed using different quantization modes (refer to
[`support matrix`](docs/source/reference/support-matrix.md#software)). TensorRT-LLM supports
[`support matrix`](https://nvidia.github.io/TensorRT-LLM/reference/support-matrix.html#software)). TensorRT-LLM supports
INT4 or INT8 weights (and FP16 activations; a.k.a. INT4/INT8 weight-only) as
well as a complete implementation of the
[SmoothQuant](https://arxiv.org/abs/2211.10438) technique.
Expand All @@ -70,8 +70,8 @@ well as a complete implementation of the

To get started with TensorRT-LLM, visit our documentation:

- [Quick Start Guide](docs/source/quick-start-guide.md)
- [Release Notes](docs/source/release-notes.md)
- [Installation Guide for Linux](docs/source/installation/linux.md)
- [Installation Guide for Windows](docs/source/installation/windows.md)
- [Supported Hardware, Models, and other Software](docs/source/reference/support-matrix.md)
- [Quick Start Guide](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html)
- [Release Notes](https://nvidia.github.io/TensorRT-LLM/release-notes.html)
- [Installation Guide for Linux](https://nvidia.github.io/TensorRT-LLM/installation/linux.html)
- [Installation Guide for Windows](https://nvidia.github.io/TensorRT-LLM/installation/windows.html)
- [Supported Hardware, Models, and other Software](https://nvidia.github.io/TensorRT-LLM/reference/support-matrix.html)
Loading