test: Enable GB200 torch compile multi gpu tests #6145

yizhang-nv · 2025-07-17T13:17:34Z

Summary by CodeRabbit

Bug Fixes
- Improved SLURM test environment handling by ensuring the NVIDIA_IMEX_CHANNELS variable is properly set for both single-node and multi-node runs.
- Adjusted test configuration for better resource management by modifying GPU memory fraction settings in specific tests.
- Updated test skip conditions to reflect changes in device and MPI world size requirements, and addressed accuracy issues with targeted runtime skips.
Tests
- Simplified test parameterization by removing device-specific skip marks.
- Updated waiver list to re-enable a previously skipped test.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-07-17T13:17:41Z

Walkthrough

The updates modify SLURM job environment variable handling in a Jenkins Groovy script, simplify test parameterization and skip logic in a PyTorch LLM API integration test file, adjust a test configuration parameter, and remove a skip entry from a test waiver list. No changes to public APIs or control flow were made.

Changes

File(s)	Change Summary
jenkins/L0_Test.groovy	Added `NVIDIA_IMEX_CHANNELS=0` environment variable to Docker/SLURM runs; updated SLURM multi-node env handling.
tests/integration/defs/accuracy/test_llm_api_pytorch.py	Replaced device-specific skip marks in parameterization with `[False, True]`; adjusted skip decorators and config; added runtime skip for TRTLLM backend.
tests/integration/test_lists/waives.txt	Removed skip entry for `TestDeepSeekR1::test_nvfp4_multi_gpus[throughput_tp8]`.

Sequence Diagram(s)

sequenceDiagram
    participant Jenkins
    participant SLURM
    participant Docker

    Jenkins->>SLURM: Submit job with NVIDIA_IMEX_CHANNELS=0 (env or --container-env)
    SLURM->>Docker: Launch container with NVIDIA_IMEX_CHANNELS=0
    Docker-->>SLURM: Run tests
    SLURM-->>Jenkins: Return results

Estimated code review effort

2 (10–30 minutes)

Possibly related PRs

fix: Fix DeepSeek R1 CI #6129: Also changes the free_gpu_memory_fraction parameter in TestDeepSeekR1.test_nvfp4_multi_gpus, indicating both PRs alter the same configuration in the same test.

Suggested reviewers

chzblych
niukuo

Poem

In Jenkins' warren, scripts now grow,
With SLURM and Docker set to go.
Tests hop lighter, skips removed,
Configs tweaked and flows improved.
No more waivers, all in line—
A rabbit’s work, efficient and fine! 🐇✨

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9998a3 and 3778211.

📒 Files selected for processing (1)

tests/integration/defs/accuracy/test_llm_api_pytorch.py (14 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (4)

tests/integration/defs/accuracy/test_llm_api_pytorch.py (4)

87-87: LGTM! Enabling torch_compile tests on GB200 devices aligns with PR objectives.

The removal of skip_device_contain_gb200 marks from the torch_compile parameter across multiple test methods successfully enables torch compile functionality on GB200 devices, which is the main goal of this PR.

Also applies to: 103-103, 131-131, 154-154, 637-637, 671-671, 714-714, 859-859, 1055-1055, 1097-1097

1333-1333: Good memory optimization to prevent OOM issues.

Reducing the KV cache memory fraction from 0.80 to 0.70 is a prudent adjustment that likely prevents out-of-memory issues during multi-GPU testing, especially with torch_compile now enabled on GB200 devices.

1811-1811: Improved skip condition for distributed testing.

Changing from skip_less_device(8) to skip_less_mpi_world_size(8) is more semantically correct for multi-GPU tests, as it accurately reflects the distributed computing environment requirements rather than just physical device count.

1820-1823: Appropriate defensive skip for known accuracy issues.

Adding a runtime skip for the TRTLLM MOE backend when accuracy issues are present is a good defensive practice. This prevents false test failures while the underlying issue (referenced in the bug report) is being resolved.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

yizhang-nv · 2025-07-17T13:19:12Z

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4"

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

jenkins/L0_Test.groovy (1)
351-359: Pass the value inline to --container-env to avoid relying on a prior export.

--container-env accepts both NAME (copy from parent env) and NAME=value.
You’re exporting the variable later in the launch script, so this will work, but making the value explicit here is slightly safer and self-contained (e.g. if the export line ever gets removed).
-            "--container-env=NVIDIA_IMEX_CHANNELS"
+            "--container-env=NVIDIA_IMEX_CHANNELS=0"
This also removes one implicit dependency and makes the generated SLURM command clearer when debugging.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58d22a7 and f546a9d.

📒 Files selected for processing (2)

jenkins/L0_Test.groovy (2 hunks)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (12 hunks)

🔇 Additional comments (6)

jenkins/L0_Test.groovy (1)

378-381: Env var correctly exported before srun.

The explicit export NVIDIA_IMEX_CHANNELS=0 ensures the variable is available to SLURM and propagated into the container runtime. Good catch.

tests/integration/defs/accuracy/test_llm_api_pytorch.py (5)

87-87: LGTM: Torch compile parameterization enabled.

The removal of device-specific skip conditions for the torch_compile parameter is consistent with the PR objective to enable GB200 torch compile multi-GPU tests.

103-103: LGTM: Multi-GPU torch compile testing enabled.

The parameterization change enables torch compile testing on 4-GPU configurations, which directly supports the PR objective of enabling GB200 torch compile multi-GPU tests.

131-131: LGTM: FP8 torch compile testing enabled.

The removal of device-specific skip conditions for both single-GPU and multi-GPU FP8 tests enables comprehensive torch compile testing across different quantization scenarios.

Also applies to: 154-154

637-637: LGTM: Comprehensive torch compile testing enabled for DeepSeekV3Lite.

The consistent removal of device-specific skip conditions across all test methods (bfloat16, fp8_block_scales, and nvfp4) enables comprehensive torch compile testing for both single-GPU and multi-GPU configurations. The existing skip logic for unsupported combinations (e.g., PP with torch.compile) is properly maintained.

Also applies to: 671-671, 714-714, 859-859, 1055-1055, 1097-1097

1333-1336: LGTM: Explicit torch compile configuration for multi-GPU testing.

The explicit configuration of TorchCompileConfig with enable_fullgraph=True enables torch compile in full graph mode for this multi-GPU test. This approach differs from the parameterized approach in other tests but is appropriate for ensuring consistent torch compile behavior in this specific test scenario.

tensorrt-cicd · 2025-07-17T13:24:35Z

PR_Github #12206 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-17T15:03:56Z

PR_Github #12206 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9063 (Partly Tested) completed with status: 'FAILURE'

yizhang-nv · 2025-07-17T15:21:50Z

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4"

tensorrt-cicd · 2025-07-17T15:27:09Z

PR_Github #12213 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-17T15:41:28Z

PR_Github #12213 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9069 (Partly Tested) completed with status: 'FAILURE'

yizhang-nv · 2025-07-17T16:43:07Z

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4"

tensorrt-cicd · 2025-07-17T16:48:49Z

PR_Github #12221 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-17T17:02:07Z

PR_Github #12221 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9075 (Partly Tested) completed with status: 'FAILURE'

yizhang-nv · 2025-07-17T17:13:57Z

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4"

tensorrt-cicd · 2025-07-17T17:19:40Z

PR_Github #12225 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-17T17:36:52Z

PR_Github #12225 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9079 (Partly Tested) completed with status: 'FAILURE'

yizhang-nv · 2025-07-17T17:43:40Z

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4"

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53e33b8 and 46a7cf5.

📒 Files selected for processing (51)

cpp/CMakeLists.txt (2 hunks)
cpp/include/tensorrt_llm/batch_manager/runtimeBuffers.h (1 hunks)
cpp/tensorrt_llm/batch_manager/runtimeBuffers.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/CMakeLists.txt (2 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h (0 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/bindings.h (0 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/buffers.h (0 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (0 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (0 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h (0 hunks)
cpp/tensorrt_llm/nanobind/bindings.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/common/bindTypes.h (0 hunks)
cpp/tensorrt_llm/nanobind/common/customCasters.h (0 hunks)
cpp/tensorrt_llm/nanobind/executor/bindings.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/executor/bindings.h (0 hunks)
cpp/tensorrt_llm/nanobind/executor/executor.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/executor/executor.h (0 hunks)
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/executor/executorConfig.h (0 hunks)
cpp/tensorrt_llm/nanobind/executor/request.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/executor/request.h (0 hunks)
cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/runtime/bindings.h (0 hunks)
cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/runtime/moeBindings.h (0 hunks)
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h (0 hunks)
cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp (0 hunks)
cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (0 hunks)
cpp/tensorrt_llm/pybind/bindings.cpp (1 hunks)
cpp/tensorrt_llm/pybind/executor/bindings.cpp (1 hunks)
cpp/tensorrt_llm/pybind/executor/executorConfig.cpp (1 hunks)
examples/models/core/llama/summarize_long.py (1 hunks)
examples/models/core/qwen2audio/run.py (1 hunks)
examples/models/core/qwenvl/run.py (1 hunks)
jenkins/Build.groovy (0 hunks)
jenkins/L0_Test.groovy (2 hunks)
tensorrt_llm/builder.py (1 hunks)
tensorrt_llm/commands/build.py (1 hunks)
tensorrt_llm/runtime/model_runner.py (1 hunks)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (12 hunks)
tests/integration/test_lists/test-db/l0_a10.yml (0 hunks)
tests/integration/test_lists/waives.txt (0 hunks)
tests/unittest/bindings/test_bindings_ut.py (0 hunks)
tests/unittest/bindings/test_executor_bindings.py (1 hunks)

💤 Files with no reviewable changes (34)

tests/integration/test_lists/waives.txt
cpp/tensorrt_llm/nanobind/executor/request.h
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h
cpp/tensorrt_llm/nanobind/batch_manager/bindings.h
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/nanobind/userbuffers/bindings.h
cpp/tensorrt_llm/nanobind/runtime/moeBindings.h
cpp/tensorrt_llm/nanobind/runtime/bindings.h
cpp/tensorrt_llm/nanobind/batch_manager/buffers.h
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h
tests/unittest/bindings/test_bindings_ut.py
cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp
cpp/tensorrt_llm/nanobind/executor/bindings.h
tests/integration/test_lists/test-db/l0_a10.yml
jenkins/Build.groovy
cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
cpp/tensorrt_llm/nanobind/runtime/bindings.cpp
cpp/tensorrt_llm/nanobind/executor/executorConfig.h
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp
cpp/tensorrt_llm/nanobind/executor/executor.cpp
cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp
cpp/tensorrt_llm/nanobind/common/bindTypes.h
cpp/tensorrt_llm/nanobind/executor/bindings.cpp
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
cpp/tensorrt_llm/nanobind/common/customCasters.h
cpp/tensorrt_llm/nanobind/executor/request.cpp
cpp/tensorrt_llm/nanobind/executor/executor.h

✅ Files skipped from review due to trivial changes (1)

tests/unittest/bindings/test_executor_bindings.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/integration/defs/accuracy/test_llm_api_pytorch.py
jenkins/L0_Test.groovy

🧰 Additional context used

🧬 Code Graph Analysis (2)

examples/models/core/llama/summarize_long.py (1)

tensorrt_llm/runtime/generation.py (1)

kv_cache_type (1169-1170)

examples/models/core/qwenvl/run.py (1)

tensorrt_llm/runtime/generation.py (1)

kv_cache_type (1169-1170)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (18)

cpp/include/tensorrt_llm/batch_manager/runtimeBuffers.h (1)

171-171: Good improvement: Switched to exclusive ownership model.

The change from std::shared_ptr to std::unique_ptr for transformerBuffers is appropriate and beneficial. This aligns with the nanobind removal and provides:

More explicit ownership semantics (exclusive vs shared)

Better performance (no reference counting overhead)

Clearer intent that this buffer is exclusively owned by the RuntimeBuffers instance

Since the class already has deleted copy operations, shared ownership was unnecessary.

cpp/tensorrt_llm/batch_manager/runtimeBuffers.cpp (1)

87-88: Consistent implementation change for unique ownership.

The change from std::make_shared to std::make_unique correctly implements the ownership model change from the header. The constructor parameters and initialization logic remain unchanged, ensuring functional consistency while improving ownership semantics.

cpp/CMakeLists.txt (2)

201-204: LGTM: Simplified pybind11 inclusion logic

The removal of OR BUILD_DEEP_EP condition correctly aligns with the nanobind removal effort. The logic now cleanly restricts pybind11 inclusion to only when BINDING_TYPE is explicitly set to "pybind".

220-222: LGTM: Consistent pybind11 include directory logic

The change maintains consistency with the subdirectory inclusion logic above, ensuring pybind11 includes are only added when BINDING_TYPE equals "pybind".
tensorrt_llm/runtime/model_runner.py (1)

89-89: No action needed: KVCacheType initializes from strings via its pybind constructor

The KVCacheType enum in Python is bound with
.def(py::init(&tr::ModelConfig::KVCacheTypeFromString))
so calling
KVCacheType(builder_config.get('kv_cache_type'))
correctly parses the string into the enum. This matches existing patterns in builder.py and across the codebase.
examples/models/core/qwenvl/run.py (1)

121-121: Consistent KVCacheType instantiation pattern

The change from KVCacheType.from_string() to KVCacheType() is consistent with the pattern seen in other files. This maintains uniformity across the codebase.

examples/models/core/llama/summarize_long.py (1)

100-100: Consistent KVCacheType constructor usage

The change follows the same pattern as other files in the codebase, replacing the factory method with direct constructor usage. This maintains consistency across the entire codebase.
cpp/tensorrt_llm/pybind/bindings.cpp (1)

173-173: API improvement: Constructor-based enum instantiation

The change from a from_string method to a constructor-based approach using py::init is a good improvement. This makes the API more Pythonic and consistent with standard enum usage patterns.
// Before: KVCacheType.from_string("paged")
// After:  KVCacheType("paged")
This change aligns well with the corresponding Python-side updates across the codebase.
examples/models/core/qwen2audio/run.py (1)

125-125: Consistent API update for KVCacheType instantiation

The change from KVCacheType.from_string() to direct constructor call KVCacheType() is consistent with the new constructor-based API. The functionality remains identical while following the updated enum instantiation pattern.

tensorrt_llm/commands/build.py (1)

134-134: Simplified argument parsing with direct enum usage

The change from using a custom enum_type wrapper to directly using KVCacheType as the argument type is a good simplification. This leverages the enum's built-in string parsing capabilities through its constructor, reducing complexity while maintaining the same functionality.

tensorrt_llm/builder.py (1)

596-596: Approved: KVCacheType String Conversion via Constructor

The change to use KVCacheType(config.pop('kv_cache_type')) is consistent with existing patterns elsewhere in the codebase (e.g., runtime/model_runner.py and various example scripts), demonstrating that the constructor already handles string inputs correctly. No further updates are needed.

cpp/tensorrt_llm/nanobind/CMakeLists.txt (4)

6-6: Source file list has been drastically reduced

The source files have been reduced to only ipcNvlsMemory.cu and bindings.cpp. Given that the nanobind module is being simplified/removed, ensure that these are the only required sources and that no critical functionality is being lost.

19-21: Simplified library linking

The removal of conditional NVSHMEM libraries and CUDA_NVML_LIB aligns with the reduced functionality. The retained libraries (SHARED_TARGET, Python3_LIBRARIES, TORCH_LIBRARIES, torch_python) appear sufficient for the remaining functionality.

24-24: Updated compile definition to nanobind-specific macro

The change from PYBIND11_DETAILED_ERROR_MESSAGES=1 to NB_DETAILED_ERROR_MESSAGES=1 correctly reflects the use of nanobind instead of pybind11 for this module.

31-31: Simplified runtime path configuration

The removal of CUDA stubs directory from rpath is consistent with the reduced dependencies. The remaining paths for local libs and NCCL should be sufficient.

cpp/tensorrt_llm/pybind/executor/bindings.cpp (1)

247-247: Simplified binding leverages pybind11's native type support

The direct exposure of getLatestEvents with optional timeout is cleaner and more maintainable than the previous lambda wrapper approach. This change correctly utilizes pybind11's built-in support for std::optional and chrono types.

cpp/tensorrt_llm/nanobind/bindings.cpp (2)

2-2: Copyright year updated to 2024

The copyright year has been changed from 2025 to 2024, which appears to be a correction since 2025 is in the future.

24-28: Minimal nanobind module retained

The module has been stripped of all functionality but still defines a basic nanobind module with documentation. Consider whether this placeholder module is necessary or if it should be completely removed as part of the nanobind deprecation.

cpp/tensorrt_llm/pybind/executor/executorConfig.cpp

yizhang-nv · 2025-07-17T18:52:19Z

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4"

tensorrt-cicd · 2025-07-17T18:57:42Z

PR_Github #12232 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-17T20:19:20Z

PR_Github #12232 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9084 (Partly Tested) completed with status: 'FAILURE'

yizhang-nv · 2025-07-18T01:04:18Z

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4" --disable-fail-fast

tensorrt-cicd · 2025-07-18T01:09:54Z

PR_Github #12247 [ run ] triggered by Bot

yizhang-nv · 2025-07-18T03:07:20Z

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4" --disable-fail-fast

tensorrt-cicd · 2025-07-18T03:12:26Z

PR_Github #12265 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-18T03:12:27Z

PR_Github #12247 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-07-18T05:24:44Z

PR_Github #12265 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9106 (Partly Tested) completed with status: 'FAILURE'

yizhang-nv · 2025-07-20T14:08:15Z

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4" --disable-fail-fast

tensorrt-cicd · 2025-07-20T14:13:50Z

PR_Github #12384 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-20T14:13:52Z

PR_Github #12378 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-07-20T19:51:26Z

PR_Github #12384 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9203 (Partly Tested) completed with status: 'FAILURE'

yizhang-nv · 2025-07-21T01:53:16Z

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4" --disable-fail-fast

tensorrt-cicd · 2025-07-21T01:58:55Z

PR_Github #12396 [ run ] triggered by Bot

jenkins/scripts/slurm_run.sh

tensorrt-cicd · 2025-07-21T04:43:55Z

PR_Github #12396 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9215 (Partly Tested) completed with status: 'SUCCESS'

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

yizhang-nv · 2025-07-21T08:19:21Z

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4" --disable-fail-fast

tensorrt-cicd · 2025-07-21T08:24:36Z

PR_Github #12432 [ run ] triggered by Bot

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Update test_llm_api_pytorch.py Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

…y_moe_trtllm] Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

yizhang-nv · 2025-07-21T11:14:24Z

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4" --disable-fail-fast

tensorrt-cicd · 2025-07-21T11:20:00Z

PR_Github #12443 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-21T11:20:02Z

PR_Github #12432 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-07-21T13:46:01Z

PR_Github #12443 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9253 (Partly Tested) completed with status: 'SUCCESS'

yizhang-nv · 2025-07-21T13:57:00Z

/bot skip --comment "Related tests are passed"

tensorrt-cicd · 2025-07-21T14:02:08Z

PR_Github #12445 [ skip ] triggered by Bot

tensorrt-cicd · 2025-07-21T14:17:10Z

PR_Github #12445 [ skip ] completed with state SUCCESS
Skipping testing for commit 3778211

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

coderabbitai bot reviewed Jul 17, 2025

View reviewed changes

yizhang-nv force-pushed the improve-ci branch from eae697e to 8190943 Compare July 17, 2025 15:21

yizhang-nv force-pushed the improve-ci branch from 8190943 to 53e33b8 Compare July 17, 2025 17:13

yizhang-nv force-pushed the improve-ci branch from 53e33b8 to 79c4043 Compare July 17, 2025 17:43

coderabbitai bot reviewed Jul 17, 2025

View reviewed changes

cpp/tensorrt_llm/pybind/executor/executorConfig.cpp Outdated Show resolved Hide resolved

yizhang-nv force-pushed the improve-ci branch 2 times, most recently from 06338d4 to 33dfd64 Compare July 18, 2025 03:06

yizhang-nv force-pushed the improve-ci branch from 70ec17e to 00aff75 Compare July 21, 2025 01:52

litaotju approved these changes Jul 21, 2025

View reviewed changes

jenkins/scripts/slurm_run.sh Outdated Show resolved Hide resolved

yizhang-nv force-pushed the improve-ci branch from 00aff75 to d17ee1c Compare July 21, 2025 05:21

yizhang-nv added 2 commits July 21, 2025 16:19

Enable GB200 torch compile multi gpu tests

63451cc

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

Unwaive tests

745ea8f

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

yizhang-nv force-pushed the improve-ci branch from d17ee1c to c9998a3 Compare July 21, 2025 08:19

yizhang-nv added 2 commits July 21, 2025 10:58

add debug params

0782cce

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Update test_llm_api_pytorch.py Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

Waive test_llm_api_pytorch.py::TestQwen3_235B_A22B::test_nvfp4[latenc…

3778211

…y_moe_trtllm] Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

yizhang-nv force-pushed the improve-ci branch 2 times, most recently from 20ca107 to 3778211 Compare July 21, 2025 11:13

yizhang-nv enabled auto-merge (squash) July 21, 2025 13:57

yizhang-nv merged commit f9b0a91 into NVIDIA:main Jul 21, 2025
3 checks passed

coderabbitai bot mentioned this pull request Jul 21, 2025

add env for dlcluster docker command for slurm job #6234

Open

timlee0212 pushed a commit to timlee0212/TensorRT-LLM that referenced this pull request Jul 21, 2025

test: Enable GB200 torch compile multi gpu tests (NVIDIA#6145)

51ed77e

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

test: Enable GB200 torch compile multi gpu tests #6145

test: Enable GB200 torch compile multi gpu tests #6145

Uh oh!

Conversation

yizhang-nv commented Jul 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

yizhang-nv commented Jul 17, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

yizhang-nv commented Jul 17, 2025

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

yizhang-nv commented Jul 17, 2025

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

yizhang-nv commented Jul 17, 2025

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

yizhang-nv commented Jul 17, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yizhang-nv commented Jul 17, 2025

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

tensorrt-cicd commented Jul 17, 2025

Uh oh!

yizhang-nv commented Jul 18, 2025

Uh oh!

tensorrt-cicd commented Jul 18, 2025

Uh oh!

yizhang-nv commented Jul 18, 2025

Uh oh!

tensorrt-cicd commented Jul 18, 2025

Uh oh!

tensorrt-cicd commented Jul 18, 2025

Uh oh!

tensorrt-cicd commented Jul 18, 2025

Uh oh!

yizhang-nv commented Jul 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 17, 2025 •

edited

Loading