Skip to content

test: Enable GB200 torch compile multi gpu tests #6145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 21, 2025

Conversation

yizhang-nv
Copy link
Member

@yizhang-nv yizhang-nv commented Jul 17, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Improved SLURM test environment handling by ensuring the NVIDIA_IMEX_CHANNELS variable is properly set for both single-node and multi-node runs.
    • Adjusted test configuration for better resource management by modifying GPU memory fraction settings in specific tests.
    • Updated test skip conditions to reflect changes in device and MPI world size requirements, and addressed accuracy issues with targeted runtime skips.
  • Tests

    • Simplified test parameterization by removing device-specific skip marks.
    • Updated waiver list to re-enable a previously skipped test.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Copy link

coderabbitai bot commented Jul 17, 2025

Walkthrough

The updates modify SLURM job environment variable handling in a Jenkins Groovy script, simplify test parameterization and skip logic in a PyTorch LLM API integration test file, adjust a test configuration parameter, and remove a skip entry from a test waiver list. No changes to public APIs or control flow were made.

Changes

File(s) Change Summary
jenkins/L0_Test.groovy Added NVIDIA_IMEX_CHANNELS=0 environment variable to Docker/SLURM runs; updated SLURM multi-node env handling.
tests/integration/defs/accuracy/test_llm_api_pytorch.py Replaced device-specific skip marks in parameterization with [False, True]; adjusted skip decorators and config; added runtime skip for TRTLLM backend.
tests/integration/test_lists/waives.txt Removed skip entry for TestDeepSeekR1::test_nvfp4_multi_gpus[throughput_tp8].

Sequence Diagram(s)

sequenceDiagram
    participant Jenkins
    participant SLURM
    participant Docker

    Jenkins->>SLURM: Submit job with NVIDIA_IMEX_CHANNELS=0 (env or --container-env)
    SLURM->>Docker: Launch container with NVIDIA_IMEX_CHANNELS=0
    Docker-->>SLURM: Run tests
    SLURM-->>Jenkins: Return results
Loading

Estimated code review effort

2 (10–30 minutes)

Possibly related PRs

  • fix: Fix DeepSeek R1 CI #6129: Also changes the free_gpu_memory_fraction parameter in TestDeepSeekR1.test_nvfp4_multi_gpus, indicating both PRs alter the same configuration in the same test.

Suggested reviewers

  • chzblych
  • niukuo

Poem

In Jenkins' warren, scripts now grow,
With SLURM and Docker set to go.
Tests hop lighter, skips removed,
Configs tweaked and flows improved.
No more waivers, all in line—
A rabbit’s work, efficient and fine! 🐇✨


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9998a3 and 3778211.

📒 Files selected for processing (1)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py (14 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (4)

87-87: LGTM! Enabling torch_compile tests on GB200 devices aligns with PR objectives.

The removal of skip_device_contain_gb200 marks from the torch_compile parameter across multiple test methods successfully enables torch compile functionality on GB200 devices, which is the main goal of this PR.

Also applies to: 103-103, 131-131, 154-154, 637-637, 671-671, 714-714, 859-859, 1055-1055, 1097-1097


1333-1333: Good memory optimization to prevent OOM issues.

Reducing the KV cache memory fraction from 0.80 to 0.70 is a prudent adjustment that likely prevents out-of-memory issues during multi-GPU testing, especially with torch_compile now enabled on GB200 devices.


1811-1811: Improved skip condition for distributed testing.

Changing from skip_less_device(8) to skip_less_mpi_world_size(8) is more semantically correct for multi-GPU tests, as it accurately reflects the distributed computing environment requirements rather than just physical device count.


1820-1823: Appropriate defensive skip for known accuracy issues.

Adding a runtime skip for the TRTLLM MOE backend when accuracy issues are present is a good defensive practice. This prevents false test failures while the underlying issue (referenced in the bug report) is being resolved.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@yizhang-nv
Copy link
Member Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4"

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
jenkins/L0_Test.groovy (1)

351-359: Pass the value inline to --container-env to avoid relying on a prior export.

--container-env accepts both NAME (copy from parent env) and NAME=value.
You’re exporting the variable later in the launch script, so this will work, but making the value explicit here is slightly safer and self-contained (e.g. if the export line ever gets removed).

-            "--container-env=NVIDIA_IMEX_CHANNELS"
+            "--container-env=NVIDIA_IMEX_CHANNELS=0"

This also removes one implicit dependency and makes the generated SLURM command clearer when debugging.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58d22a7 and f546a9d.

📒 Files selected for processing (2)
  • jenkins/L0_Test.groovy (2 hunks)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py (12 hunks)
🔇 Additional comments (6)
jenkins/L0_Test.groovy (1)

378-381: Env var correctly exported before srun.

The explicit export NVIDIA_IMEX_CHANNELS=0 ensures the variable is available to SLURM and propagated into the container runtime. Good catch.

tests/integration/defs/accuracy/test_llm_api_pytorch.py (5)

87-87: LGTM: Torch compile parameterization enabled.

The removal of device-specific skip conditions for the torch_compile parameter is consistent with the PR objective to enable GB200 torch compile multi-GPU tests.


103-103: LGTM: Multi-GPU torch compile testing enabled.

The parameterization change enables torch compile testing on 4-GPU configurations, which directly supports the PR objective of enabling GB200 torch compile multi-GPU tests.


131-131: LGTM: FP8 torch compile testing enabled.

The removal of device-specific skip conditions for both single-GPU and multi-GPU FP8 tests enables comprehensive torch compile testing across different quantization scenarios.

Also applies to: 154-154


637-637: LGTM: Comprehensive torch compile testing enabled for DeepSeekV3Lite.

The consistent removal of device-specific skip conditions across all test methods (bfloat16, fp8_block_scales, and nvfp4) enables comprehensive torch compile testing for both single-GPU and multi-GPU configurations. The existing skip logic for unsupported combinations (e.g., PP with torch.compile) is properly maintained.

Also applies to: 671-671, 714-714, 859-859, 1055-1055, 1097-1097


1333-1336: LGTM: Explicit torch compile configuration for multi-GPU testing.

The explicit configuration of TorchCompileConfig with enable_fullgraph=True enables torch compile in full graph mode for this multi-GPU test. This approach differs from the parameterized approach in other tests but is appropriate for ensuring consistent torch compile behavior in this specific test scenario.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12206 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12206 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9063 (Partly Tested) completed with status: 'FAILURE'

@yizhang-nv
Copy link
Member Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12213 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12213 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9069 (Partly Tested) completed with status: 'FAILURE'

@yizhang-nv
Copy link
Member Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12221 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12221 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9075 (Partly Tested) completed with status: 'FAILURE'

@yizhang-nv
Copy link
Member Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12225 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12225 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9079 (Partly Tested) completed with status: 'FAILURE'

@yizhang-nv
Copy link
Member Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4"

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53e33b8 and 46a7cf5.

📒 Files selected for processing (51)
  • cpp/CMakeLists.txt (2 hunks)
  • cpp/include/tensorrt_llm/batch_manager/runtimeBuffers.h (1 hunks)
  • cpp/tensorrt_llm/batch_manager/runtimeBuffers.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/CMakeLists.txt (2 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/bindings.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/buffers.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/bindings.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/common/bindTypes.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/common/customCasters.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/executor/bindings.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/executor/bindings.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/executor/executor.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/executor/executor.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/executor/executorConfig.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/executor/request.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/executor/request.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/runtime/bindings.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/runtime/moeBindings.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp (0 hunks)
  • cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (0 hunks)
  • cpp/tensorrt_llm/pybind/bindings.cpp (1 hunks)
  • cpp/tensorrt_llm/pybind/executor/bindings.cpp (1 hunks)
  • cpp/tensorrt_llm/pybind/executor/executorConfig.cpp (1 hunks)
  • examples/models/core/llama/summarize_long.py (1 hunks)
  • examples/models/core/qwen2audio/run.py (1 hunks)
  • examples/models/core/qwenvl/run.py (1 hunks)
  • jenkins/Build.groovy (0 hunks)
  • jenkins/L0_Test.groovy (2 hunks)
  • tensorrt_llm/builder.py (1 hunks)
  • tensorrt_llm/commands/build.py (1 hunks)
  • tensorrt_llm/runtime/model_runner.py (1 hunks)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py (12 hunks)
  • tests/integration/test_lists/test-db/l0_a10.yml (0 hunks)
  • tests/integration/test_lists/waives.txt (0 hunks)
  • tests/unittest/bindings/test_bindings_ut.py (0 hunks)
  • tests/unittest/bindings/test_executor_bindings.py (1 hunks)
💤 Files with no reviewable changes (34)
  • tests/integration/test_lists/waives.txt
  • cpp/tensorrt_llm/nanobind/executor/request.h
  • cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h
  • cpp/tensorrt_llm/nanobind/batch_manager/bindings.h
  • cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h
  • cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h
  • cpp/tensorrt_llm/nanobind/userbuffers/bindings.h
  • cpp/tensorrt_llm/nanobind/runtime/moeBindings.h
  • cpp/tensorrt_llm/nanobind/runtime/bindings.h
  • cpp/tensorrt_llm/nanobind/batch_manager/buffers.h
  • cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h
  • tests/unittest/bindings/test_bindings_ut.py
  • cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp
  • cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp
  • cpp/tensorrt_llm/nanobind/executor/bindings.h
  • tests/integration/test_lists/test-db/l0_a10.yml
  • jenkins/Build.groovy
  • cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp
  • cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h
  • cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp
  • cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
  • cpp/tensorrt_llm/nanobind/runtime/bindings.cpp
  • cpp/tensorrt_llm/nanobind/executor/executorConfig.h
  • cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp
  • cpp/tensorrt_llm/nanobind/executor/executor.cpp
  • cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp
  • cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp
  • cpp/tensorrt_llm/nanobind/common/bindTypes.h
  • cpp/tensorrt_llm/nanobind/executor/bindings.cpp
  • cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp
  • cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
  • cpp/tensorrt_llm/nanobind/common/customCasters.h
  • cpp/tensorrt_llm/nanobind/executor/request.cpp
  • cpp/tensorrt_llm/nanobind/executor/executor.h
✅ Files skipped from review due to trivial changes (1)
  • tests/unittest/bindings/test_executor_bindings.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
  • jenkins/L0_Test.groovy
🧰 Additional context used
🧬 Code Graph Analysis (2)
examples/models/core/llama/summarize_long.py (1)
tensorrt_llm/runtime/generation.py (1)
  • kv_cache_type (1169-1170)
examples/models/core/qwenvl/run.py (1)
tensorrt_llm/runtime/generation.py (1)
  • kv_cache_type (1169-1170)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (18)
cpp/include/tensorrt_llm/batch_manager/runtimeBuffers.h (1)

171-171: Good improvement: Switched to exclusive ownership model.

The change from std::shared_ptr to std::unique_ptr for transformerBuffers is appropriate and beneficial. This aligns with the nanobind removal and provides:

  • More explicit ownership semantics (exclusive vs shared)
  • Better performance (no reference counting overhead)
  • Clearer intent that this buffer is exclusively owned by the RuntimeBuffers instance

Since the class already has deleted copy operations, shared ownership was unnecessary.

cpp/tensorrt_llm/batch_manager/runtimeBuffers.cpp (1)

87-88: Consistent implementation change for unique ownership.

The change from std::make_shared to std::make_unique correctly implements the ownership model change from the header. The constructor parameters and initialization logic remain unchanged, ensuring functional consistency while improving ownership semantics.

cpp/CMakeLists.txt (2)

201-204: LGTM: Simplified pybind11 inclusion logic

The removal of OR BUILD_DEEP_EP condition correctly aligns with the nanobind removal effort. The logic now cleanly restricts pybind11 inclusion to only when BINDING_TYPE is explicitly set to "pybind".


220-222: LGTM: Consistent pybind11 include directory logic

The change maintains consistency with the subdirectory inclusion logic above, ensuring pybind11 includes are only added when BINDING_TYPE equals "pybind".

tensorrt_llm/runtime/model_runner.py (1)

89-89: No action needed: KVCacheType initializes from strings via its pybind constructor

The KVCacheType enum in Python is bound with
.def(py::init(&tr::ModelConfig::KVCacheTypeFromString))
so calling

KVCacheType(builder_config.get('kv_cache_type'))

correctly parses the string into the enum. This matches existing patterns in builder.py and across the codebase.

examples/models/core/qwenvl/run.py (1)

121-121: Consistent KVCacheType instantiation pattern

The change from KVCacheType.from_string() to KVCacheType() is consistent with the pattern seen in other files. This maintains uniformity across the codebase.

examples/models/core/llama/summarize_long.py (1)

100-100: Consistent KVCacheType constructor usage

The change follows the same pattern as other files in the codebase, replacing the factory method with direct constructor usage. This maintains consistency across the entire codebase.

cpp/tensorrt_llm/pybind/bindings.cpp (1)

173-173: API improvement: Constructor-based enum instantiation

The change from a from_string method to a constructor-based approach using py::init is a good improvement. This makes the API more Pythonic and consistent with standard enum usage patterns.

// Before: KVCacheType.from_string("paged")
// After:  KVCacheType("paged")

This change aligns well with the corresponding Python-side updates across the codebase.

examples/models/core/qwen2audio/run.py (1)

125-125: Consistent API update for KVCacheType instantiation

The change from KVCacheType.from_string() to direct constructor call KVCacheType() is consistent with the new constructor-based API. The functionality remains identical while following the updated enum instantiation pattern.

tensorrt_llm/commands/build.py (1)

134-134: Simplified argument parsing with direct enum usage

The change from using a custom enum_type wrapper to directly using KVCacheType as the argument type is a good simplification. This leverages the enum's built-in string parsing capabilities through its constructor, reducing complexity while maintaining the same functionality.

tensorrt_llm/builder.py (1)

596-596: Approved: KVCacheType String Conversion via Constructor

The change to use KVCacheType(config.pop('kv_cache_type')) is consistent with existing patterns elsewhere in the codebase (e.g., runtime/model_runner.py and various example scripts), demonstrating that the constructor already handles string inputs correctly. No further updates are needed.

cpp/tensorrt_llm/nanobind/CMakeLists.txt (4)

6-6: Source file list has been drastically reduced

The source files have been reduced to only ipcNvlsMemory.cu and bindings.cpp. Given that the nanobind module is being simplified/removed, ensure that these are the only required sources and that no critical functionality is being lost.


19-21: Simplified library linking

The removal of conditional NVSHMEM libraries and CUDA_NVML_LIB aligns with the reduced functionality. The retained libraries (SHARED_TARGET, Python3_LIBRARIES, TORCH_LIBRARIES, torch_python) appear sufficient for the remaining functionality.


24-24: Updated compile definition to nanobind-specific macro

The change from PYBIND11_DETAILED_ERROR_MESSAGES=1 to NB_DETAILED_ERROR_MESSAGES=1 correctly reflects the use of nanobind instead of pybind11 for this module.


31-31: Simplified runtime path configuration

The removal of CUDA stubs directory from rpath is consistent with the reduced dependencies. The remaining paths for local libs and NCCL should be sufficient.

cpp/tensorrt_llm/pybind/executor/bindings.cpp (1)

247-247: Simplified binding leverages pybind11's native type support

The direct exposure of getLatestEvents with optional timeout is cleaner and more maintainable than the previous lambda wrapper approach. This change correctly utilizes pybind11's built-in support for std::optional and chrono types.

cpp/tensorrt_llm/nanobind/bindings.cpp (2)

2-2: Copyright year updated to 2024

The copyright year has been changed from 2025 to 2024, which appears to be a correction since 2025 is in the future.


24-28: Minimal nanobind module retained

The module has been stripped of all functionality but still defines a basic nanobind module with documentation. Consider whether this placeholder module is necessary or if it should be completely removed as part of the nanobind deprecation.

@yizhang-nv
Copy link
Member Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12232 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12232 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9084 (Partly Tested) completed with status: 'FAILURE'

@yizhang-nv
Copy link
Member Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4" --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12247 [ run ] triggered by Bot

@yizhang-nv yizhang-nv force-pushed the improve-ci branch 2 times, most recently from 06338d4 to 33dfd64 Compare July 18, 2025 03:06
@yizhang-nv
Copy link
Member Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4" --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12265 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12247 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12265 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9106 (Partly Tested) completed with status: 'FAILURE'

@yizhang-nv
Copy link
Member Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4" --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12384 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12378 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12384 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9203 (Partly Tested) completed with status: 'FAILURE'

@yizhang-nv
Copy link
Member Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4" --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12396 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12396 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9215 (Partly Tested) completed with status: 'SUCCESS'

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
@yizhang-nv
Copy link
Member Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4" --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12432 [ run ] triggered by Bot

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

Update test_llm_api_pytorch.py

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
…y_moe_trtllm]

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
@yizhang-nv yizhang-nv force-pushed the improve-ci branch 2 times, most recently from 20ca107 to 3778211 Compare July 21, 2025 11:13
@yizhang-nv
Copy link
Member Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1,GB200-4_GPUs-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Post-Merge-4" --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12443 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12432 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12443 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9253 (Partly Tested) completed with status: 'SUCCESS'

@yizhang-nv
Copy link
Member Author

/bot skip --comment "Related tests are passed"

@yizhang-nv yizhang-nv enabled auto-merge (squash) July 21, 2025 13:57
@tensorrt-cicd
Copy link
Collaborator

PR_Github #12445 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12445 [ skip ] completed with state SUCCESS
Skipping testing for commit 3778211

@yizhang-nv yizhang-nv merged commit f9b0a91 into NVIDIA:main Jul 21, 2025
3 checks passed
timlee0212 pushed a commit to timlee0212/TensorRT-LLM that referenced this pull request Jul 21, 2025
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants