From e4a711ce234d156cb595d389a285fe49852bb095 Mon Sep 17 00:00:00 2001
From: krishung5 <krish@nvidia.com>
Date: Wed, 7 Aug 2024 10:59:49 -0700
Subject: [PATCH 1/3] Add TRT-LLM backend to the doc

---
 README.md | 8 ++++++++
 1 file changed, 8 insertions(+)
diff --git a/README.md b/README.md
index b57d44e..2b3c4e9 100644
--- a/README.md
+++ b/README.md
@@ -123,6 +123,14 @@ to load and serve models. The
 [vllm_backend](https://github.com/triton-inference-server/vllm_backend) repo
 contains the documentation and source for the backend.
 
+**TensorRT-LLM**: The TensorRT-LLM backend allows you to serve
+[TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) models with Triton Server.
+Check out the
+[Triton TRT-LLM user guide](https://github.com/triton-inference-server/server/blob/main/docs/getting_started/trtllm_user_guide.md)
+for more information. The
+[tensorrtllm_backend](https://github.com/triton-inference-server/tensorrtllm_backend)
+repo contains the documentation and source for the backend.
+
 **Important Note!** Not all the above backends are supported on every platform
 supported by Triton. Look at the
 [Backend-Platform Support Matrix](docs/backend_platform_support_matrix.md)

From 1d36963cfe8204a4186a56fc99f462d078fa30a6 Mon Sep 17 00:00:00 2001
From: krishung5 <krish@nvidia.com>
Date: Wed, 7 Aug 2024 17:13:20 -0700
Subject: [PATCH 2/3] Add TRT-LLM backend to platform support matrix

---
 docs/backend_platform_support_matrix.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/backend_platform_support_matrix.md b/docs/backend_platform_support_matrix.md
index d00a73c..64522e6 100644
--- a/docs/backend_platform_support_matrix.md
+++ b/docs/backend_platform_support_matrix.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -54,6 +54,7 @@ each backend on different platforms.
 | DALI         |  :heavy_check_mark: GPU <br/> :heavy_check_mark: CPU  | :heavy_check_mark: GPU[^2] <br/> :heavy_check_mark: CPU[^2] |
 | FIL          |  :heavy_check_mark: GPU <br/> :heavy_check_mark: CPU  |  Unsupported  |
 | vLLM         |  :heavy_check_mark: GPU <br/> :heavy_check_mark: CPU  |  Unsupported  |
+| TensorRT-LLM |  :heavy_check_mark: GPU <br/> :x: CPU | :heavy_check_mark: GPU <br/> :x: CPU       |
 
 
 ## Windows 10

From 73d39231ac3a04e297af98eea14908b30ae3f4a1 Mon Sep 17 00:00:00 2001
From: krishung5 <krish@nvidia.com>
Date: Tue, 27 Aug 2024 12:10:41 -0700
Subject: [PATCH 3/3] Switch the order of vLLM and TRT-LLM

---
 README.md                               | 16 ++++++++--------
 docs/backend_platform_support_matrix.md |  2 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/README.md b/README.md
index 2b3c4e9..4e8cd47 100644
--- a/README.md
+++ b/README.md
@@ -115,14 +115,6 @@ random forest models. The
 [fil_backend](https://github.com/triton-inference-server/fil_backend) repo
 contains the documentation and source for the backend.
 
-**vLLM**: The vLLM backend is designed to run
-[supported models](https://vllm.readthedocs.io/en/latest/models/supported_models.html)
-on a [vLLM engine](https://github.com/vllm-project/vllm/blob/main/vllm/engine/async_llm_engine.py).
-This backend depends on [python_backend](https://github.com/triton-inference-server/python_backend)
-to load and serve models. The
-[vllm_backend](https://github.com/triton-inference-server/vllm_backend) repo
-contains the documentation and source for the backend.
-
 **TensorRT-LLM**: The TensorRT-LLM backend allows you to serve
 [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) models with Triton Server.
 Check out the
@@ -131,6 +123,14 @@ for more information. The
 [tensorrtllm_backend](https://github.com/triton-inference-server/tensorrtllm_backend)
 repo contains the documentation and source for the backend.
 
+**vLLM**: The vLLM backend is designed to run
+[supported models](https://vllm.readthedocs.io/en/latest/models/supported_models.html)
+on a [vLLM engine](https://github.com/vllm-project/vllm/blob/main/vllm/engine/async_llm_engine.py).
+This backend depends on [python_backend](https://github.com/triton-inference-server/python_backend)
+to load and serve models. The
+[vllm_backend](https://github.com/triton-inference-server/vllm_backend) repo
+contains the documentation and source for the backend.
+
 **Important Note!** Not all the above backends are supported on every platform
 supported by Triton. Look at the
 [Backend-Platform Support Matrix](docs/backend_platform_support_matrix.md)
diff --git a/docs/backend_platform_support_matrix.md b/docs/backend_platform_support_matrix.md
index 64522e6..e3eab0d 100644
--- a/docs/backend_platform_support_matrix.md
+++ b/docs/backend_platform_support_matrix.md
@@ -53,8 +53,8 @@ each backend on different platforms.
 | Python[^1]   |  :heavy_check_mark: GPU <br/> :heavy_check_mark: CPU  |  :heavy_check_mark: GPU <br/> :heavy_check_mark: CPU  |
 | DALI         |  :heavy_check_mark: GPU <br/> :heavy_check_mark: CPU  | :heavy_check_mark: GPU[^2] <br/> :heavy_check_mark: CPU[^2] |
 | FIL          |  :heavy_check_mark: GPU <br/> :heavy_check_mark: CPU  |  Unsupported  |
-| vLLM         |  :heavy_check_mark: GPU <br/> :heavy_check_mark: CPU  |  Unsupported  |
 | TensorRT-LLM |  :heavy_check_mark: GPU <br/> :x: CPU | :heavy_check_mark: GPU <br/> :x: CPU       |
+| vLLM         |  :heavy_check_mark: GPU <br/> :heavy_check_mark: CPU  |  Unsupported  |
 
 
 ## Windows 10