Skip to content

Conversation

@Chris113113
Copy link
Collaborator

@Chris113113 Chris113113 commented Nov 19, 2025

Adds a sample inferencing recipe for TRT-LLM on A4X.

This recipe has a few changes from the existing A3-Ultra TRT-LLM recipe that are needed to support XL models such as deepseek:

  1. Moves from using the C++ backend to the PyTorch backend by default
  2. Allows manually setting kv_cache_free_gpu_mem_fraction
  3. Plumbs through llm_api_args from the model config.
  4. Removes setting NCCL variables that are not needed for single host seving due to issues between NCCL_NET=gIB and TensorRT-LLM.

@Chris113113 Chris113113 changed the title WIP recipe for a4x trtllm inference Adds a sample inferencing recipe for TRT-LLM on A4X Nov 19, 2025
@Chris113113 Chris113113 marked this pull request as ready for review November 19, 2025 19:53
@junjieqian junjieqian merged commit 0017ec3 into main Nov 19, 2025
1 check passed
lepan-google added a commit to lepan-google/gpu-recipes that referenced this pull request Dec 3, 2025
with GCSFuse storage

This change added deployment configs and instructions for A4X DeepSeek R1 NVFP4 on TensorRT with GCSFuse storage. I followed [previous training storage recipe PR](AI-Hypercomputer#37) and modified based on existing [CMCS recipe with HuggingFace](AI-Hypercomputer#50)

TESTED=unit tests
Chris113113 pushed a commit that referenced this pull request Dec 5, 2025
…with GCSFuse storage (#55)

* [A4X TensorRT Inference Benchmark] A4X DeepSeek R1 NVFP4 on TensorRT
with GCSFuse storage

This change added deployment configs and instructions for A4X DeepSeek R1 NVFP4 on TensorRT with GCSFuse storage. I followed [previous training storage recipe PR](#37) and modified based on existing [CMCS recipe with HuggingFace](#50)

TESTED=unit tests

* Fix readme

* Fix README

* Resolve comments

* Format the content table

* Format content tables

* Correct grammar issue in README

* Correct format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants