Adds a sample inferencing recipe for TRT-LLM on A4X #50

Chris113113 · 2025-11-19T00:25:58Z

Adds a sample inferencing recipe for TRT-LLM on A4X.

This recipe has a few changes from the existing A3-Ultra TRT-LLM recipe that are needed to support XL models such as deepseek:

Moves from using the C++ backend to the PyTorch backend by default
Allows manually setting kv_cache_free_gpu_mem_fraction
Plumbs through llm_api_args from the model config.
Removes setting NCCL variables that are not needed for single host seving due to issues between NCCL_NET=gIB and TensorRT-LLM.

…i_args, customizing kvcache free mem, and adds an actual output example.

with GCSFuse storage This change added deployment configs and instructions for A4X DeepSeek R1 NVFP4 on TensorRT with GCSFuse storage. I followed [previous training storage recipe PR](AI-Hypercomputer#37) and modified based on existing [CMCS recipe with HuggingFace](AI-Hypercomputer#50) TESTED=unit tests

…with GCSFuse storage (#55) * [A4X TensorRT Inference Benchmark] A4X DeepSeek R1 NVFP4 on TensorRT with GCSFuse storage This change added deployment configs and instructions for A4X DeepSeek R1 NVFP4 on TensorRT with GCSFuse storage. I followed [previous training storage recipe PR](#37) and modified based on existing [CMCS recipe with HuggingFace](#50) TESTED=unit tests * Fix readme * Fix README * Resolve comments * Format the content table * Format content tables * Correct grammar issue in README * Correct format

Chris113113 added 3 commits November 18, 2025 16:25

WIP recipe for a4x trtllm inference

383b1ee

Plumbs through additional properties needed to support setting llm_ap…

a9f72fc

…i_args, customizing kvcache free mem, and adds an actual output example.

Delete lightweight test model config

4d5d422

Chris113113 changed the title ~~WIP recipe for a4x trtllm inference~~ Adds a sample inferencing recipe for TRT-LLM on A4X Nov 19, 2025

Chris113113 marked this pull request as ready for review November 19, 2025 19:53

Chris113113 requested review from junjieqian and tonyjohnchen November 19, 2025 19:54

junjieqian approved these changes Nov 19, 2025

View reviewed changes

junjieqian merged commit 0017ec3 into main Nov 19, 2025
1 check passed

lepan-google mentioned this pull request Dec 3, 2025

[A4X TensorRT Inference Benchmark] A4X DeepSeek R1 NVFP4 on TensorRT with GCSFuse storage #55

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds a sample inferencing recipe for TRT-LLM on A4X #50

Adds a sample inferencing recipe for TRT-LLM on A4X #50

Uh oh!

Chris113113 commented Nov 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adds a sample inferencing recipe for TRT-LLM on A4X #50

Adds a sample inferencing recipe for TRT-LLM on A4X #50

Uh oh!

Conversation

Chris113113 commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Chris113113 commented Nov 19, 2025 •

edited

Loading