Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add TensorRT support for GNNs #4016

Merged

Conversation

benjaminhuth
Copy link
Member

@benjaminhuth benjaminhuth commented Jan 9, 2025

Cannot be compiled currently in the CI

--- END COMMIT MESSAGE ---

Any further description goes here, @-mentions are ok here!

  • Use a conventional commits prefix: quick summary
    • We mostly use feat, fix, refactor, docs, chore and build types.
  • A milestone will be assigned by one of the maintainers

Summary by CodeRabbit

Release Notes

  • New Features

    • Added TensorRT support for edge classification in the ExaTrkX plugin.
    • Introduced a new TensorRT-based edge classifier for machine learning tasks.
  • Infrastructure

    • Added a new build job for TensorRT-enabled components.
  • Technical Improvements

    • Enhanced plugin with GPU-accelerated inference capabilities.
    • Integrated TensorRT runtime and execution context for efficient processing.
    • Improved build flexibility by removing previous dependency checks in the configuration.
    • Introduced a CMake module for finding and configuring TensorRT.

Copy link

coderabbitai bot commented Jan 9, 2025

Walkthrough

A new job, build_gnn_tensorrt, added to the .gitlab-ci.yml, it has. For TensorRT-powered edge classification, this job prepares. Introduced, the TensorRTEdgeClassifier is, enabling GPU-accelerated inference in track finding. Multiple files, the changes span: Python bindings, CMake modifications, and core implementation files to support this new functionality, included they are.

Changes

File Change Summary
.gitlab-ci.yml Added build_gnn_tensorrt job with TensorRT Docker image and dependency configuration
Examples/Python/src/ExaTrkXTrackFinding.cpp Introduced TensorRTEdgeClassifier class and Python bindings
Plugins/ExaTrkX/CMakeLists.txt Added TensorRT package discovery and library linking
Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp Defined TensorRTEdgeClassifier header with configuration and interface
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp Implemented TensorRTEdgeClassifier with TensorRT inference logic
CMakeLists.txt Removed conditional check for ACTS_EXATRKX_ENABLE_ONNX or ACTS_EXATRKX_ENABLE_TORCH
cmake/FindTensorRT.cmake Introduced module for locating and configuring TensorRT

Possibly related PRs

Suggested Reviewers

  • paulgessinger
  • stephenswat

Poem

In circuits of silicon bright, 🖥️
TensorRT dances with neural might, 🤖
Edges classified with grace,
GPU's computational embrace,
Track finding's quantum delight! 🚀


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2fd773 and 8468919.

📒 Files selected for processing (1)
  • .gitlab-ci.yml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .gitlab-ci.yml
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: CI Bridge / build_gnn_tensorrt
  • GitHub Check: macos

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added Component - Examples Affects the Examples module Component - Plugins Affects one or more Plugins labels Jan 9, 2025
@github-actions github-actions bot added this to the next milestone Jan 9, 2025
Copy link

github-actions bot commented Jan 9, 2025

📊: Physics performance monitoring for 72a3c7f

Full contents

physmon summary

@benjaminhuth benjaminhuth marked this pull request as ready for review January 15, 2025 13:53
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (4)
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1)

98-100: Use ACTS logging instead of std::cout, prefer you should.

For consistency within the codebase, replace std::cout with ACTS logging macros.

Apply this diff to use the logging framework:

 ~TimePrinter() {
-  std::cout << name << ": " << milliseconds(t0, t1) << std::endl;
+  ACTS_INFO(name << ": " << milliseconds(t0, t1) << " ms");
 }
Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (2)

38-41: Destructor to be marked override, consider you should.

Since the base class has a virtual destructor, marking the destructor in the derived class with override good practice it is.

Apply this diff for clarity:

   ~TensorRTEdgeClassifier();
+  ~TensorRTEdgeClassifier() override;

49-58: Member variables' initialization order, ensure you must.

Initialize member variables in the order they are declared to avoid warnings.

Ensure that m_cfg is initialized before m_trtLogger, as declared.

Examples/Python/src/ExaTrkXTrackFinding.cpp (1)

110-128: Logger name, more specific make you should.

For clarity and consistency, use a distinct logger name for TensorRTEdgeClassifier.

Apply this diff to specify the logger name:

                       return std::make_shared<Alg>(
-                          c, getDefaultLogger("EdgeClassifier", lvl));
+                          c, getDefaultLogger("TensorRTEdgeClassifier", lvl));
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0e65bc and f5a819a.

📒 Files selected for processing (5)
  • .gitlab-ci.yml (1 hunks)
  • Examples/Python/src/ExaTrkXTrackFinding.cpp (2 hunks)
  • Plugins/ExaTrkX/CMakeLists.txt (1 hunks)
  • Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (1 hunks)
  • Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: CI Bridge / build_gnn_tensorrt
  • GitHub Check: linux_physmon
  • GitHub Check: linux_examples_test
  • GitHub Check: missing_includes
  • GitHub Check: linux_ubuntu_extra (ubuntu2204_clang, 20)
  • GitHub Check: build_debug
🔇 Additional comments (3)
Examples/Python/src/ExaTrkXTrackFinding.cpp (1)

126-126: Missing configuration member useEdgeFeatures, verify you should.

Inconsistent the configuration is with other classifiers. Include useEdgeFeatures if required.

Ensure that all necessary configuration options are included.

Plugins/ExaTrkX/CMakeLists.txt (2)

Line range hint 1-38: Well-structured, this CMake configuration is!

Follow consistent patterns for different backends, it does. Proper organization and clarity, I sense.


26-38: Version constraints for TensorRT, specify we must!

Hmmmm, missing version constraints for TensorRT package, I see. Dangerous this can be, yes. Compatibility issues, it may cause.

Apply this change, you should:

-    find_package(TensorRT REQUIRED)
+    find_package(TensorRT 8.6 REQUIRED)

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp Outdated Show resolved Hide resolved
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp Outdated Show resolved Hide resolved
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp Outdated Show resolved Hide resolved
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp Outdated Show resolved Hide resolved
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp Outdated Show resolved Hide resolved
.gitlab-ci.yml Outdated Show resolved Hide resolved
@github-actions github-actions bot added the Infrastructure Changes to build tools, continous integration, ... label Jan 16, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
cmake/FindTensorRT.cmake (4)

1-28: Hmmmm, missing minimum version requirements, I sense.

Document the minimum required TensorRT version for this module, you should. Help users avoid compatibility issues, it will.


30-34: Validate components list, we must.

Check for invalid component names in TensorRT_FIND_COMPONENTS, wise it would be. Prevent configuration errors early, this will.

 if(NOT TensorRT_FIND_COMPONENTS)
     set(TensorRT_FIND_COMPONENTS nvinfer nvinfer_plugin nvonnxparser nvparsers)
 endif()
+set(_valid_components nvinfer nvinfer_plugin nvonnxparser nvparsers)
+foreach(component IN LISTS TensorRT_FIND_COMPONENTS)
+    if(NOT component IN_LIST _valid_components)
+        message(FATAL_ERROR "Invalid TensorRT component specified: ${component}")
+    endif()
+endforeach()
 set(TensorRT_LIBRARIES)

44-53: More helpful error message, provide we should.

Include the searched paths in the error message, helpful it would be. Guide users to correct configuration, this will.

     if(TensorRT_FIND_REQUIRED)
         message(
             FATAL_ERROR
-            "Fail to find TensorRT, please set TensorRT_ROOT. Include path not found."
+            "Failed to find TensorRT header NvInfer.h. Searched in:\n"
+            "  - ${TensorRT_ROOT}/include\n"
+            "  - $ENV{TensorRT_ROOT}/include\n"
+            "Please set TensorRT_ROOT to the installation directory."
         )
     endif()

171-174: Process components in parallel, consider we should.

For large component lists, parallel processing could speed up configuration. Optional it is, but beneficial for large projects.

+if(CMAKE_VERSION VERSION_GREATER_EQUAL "3.19")
+    cmake_policy(SET CMP0114 NEW)
+    foreach(component IN LISTS TensorRT_FIND_COMPONENTS)
+        cmake_language(DEFER DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
+            CALL _find_trt_component ${component})
+    endforeach()
+else()
     foreach(component IN LISTS TensorRT_FIND_COMPONENTS)
         _find_trt_component(${component})
     endforeach()
+endif()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b983ba9 and d70d26f.

📒 Files selected for processing (2)
  • Examples/Python/src/ExaTrkXTrackFinding.cpp (2 hunks)
  • cmake/FindTensorRT.cmake (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Examples/Python/src/ExaTrkXTrackFinding.cpp
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: linux_ubuntu_extra (ubuntu2204_clang, 20)
  • GitHub Check: linux_ubuntu_extra (ubuntu2204, 20)
  • GitHub Check: macos
  • GitHub Check: linux_ubuntu
  • GitHub Check: unused_files
  • GitHub Check: missing_includes
  • GitHub Check: build_debug
  • GitHub Check: CI Bridge / build_gnn_tensorrt
🔇 Additional comments (2)
cmake/FindTensorRT.cmake (2)

176-182: Well implemented, the package handling is.

Following CMake best practices, you are. Approve this section, I do.


56-114: Handle malformed version strings, we must.

Verify that version strings are properly extracted, essential it is. Add error handling for malformed version strings, we should.

cmake/FindTensorRT.cmake Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.gitlab-ci.yml (1)

189-224: ⚠️ Potential issue

Align with existing ExaTrkX jobs and add missing configurations, you must!

Similar to the past review, improvements needed there are:

  1. CUDA architectures with other ExaTrkX jobs, align we must
  2. Artifacts for downstream jobs, configure we should
  3. Testing stage for TensorRT functionality, define we must
  4. Torch disabled while TensorRT enabled, verify this approach we should

Apply these changes, you should:

 build_gnn_tensorrt:
   stage: build
   image: ghcr.io/acts-project/ubuntu2404_tensorrt:74
   variables:
     DEPENDENCY_URL: https://acts.web.cern.ch/ACTS/ci/ubuntu-24.04/deps.$DEPENDENCY_TAG.tar.zst
+    TORCH_CUDA_ARCH_LIST: "8.0 8.6 8.9 9.0"

   cache:
     key: ccache-${CI_JOB_NAME}-${CI_COMMIT_REF_SLUG}-${CCACHE_KEY_SUFFIX}
     fallback_keys:
       - ccache-${CI_JOB_NAME}-${CI_DEFAULT_BRANCH}-${CCACHE_KEY_SUFFIX}
     when: always
     paths:
       - ${CCACHE_DIR}

+  artifacts:
+    paths:
+      - build/
+    exclude:
+      - build/**/*.o
+    expire_in: 6 hours

   tags:
     - docker-gpu-nvidia

   script:
     - git clone $CLONE_URL src
     - cd src
     - git checkout $HEAD_SHA
     - source CI/dependencies.sh
     - cd ..
     - mkdir build
     - >
       cmake -B build -S src
       -DACTS_BUILD_PLUGIN_EXATRKX=ON
-      -DACTS_EXATRKX_ENABLE_TORCH=OFF
       -DACTS_EXATRKX_ENABLE_CUDA=ON
       -DACTS_EXATRKX_ENABLE_TENSORRT=ON
       -DPython_EXECUTABLE=$(which python3)
       -DCMAKE_CUDA_ARCHITECTURES="75;86"

     - ccache -z
     - cmake --build build -- -j6
     - ccache -s

+test_gnn_tensorrt:
+  stage: test
+  needs:
+    - build_gnn_tensorrt
+  image: ghcr.io/acts-project/ubuntu2404_tensorrt:74
+  tags:
+    - docker-gpu-nvidia
+  script:
+    - apt-get update -y
+    - git clone $CLONE_URL src
+    - cd src
+    - git checkout $HEAD_SHA
+    - source CI/dependencies.sh
+    - cd ..
+    - ctest --test-dir build -R TensorRT
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d70d26f and 0d18424.

📒 Files selected for processing (1)
  • .gitlab-ci.yml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: merge-sentinel
  • GitHub Check: CI Bridge / build_gnn_tensorrt
  • GitHub Check: unused_files
  • GitHub Check: macos
  • GitHub Check: linux_ubuntu_extra (ubuntu2204_clang, 20)
  • GitHub Check: missing_includes
  • GitHub Check: build_debug
  • GitHub Check: linux_ubuntu
  • GitHub Check: docs
🔇 Additional comments (1)
.gitlab-ci.yml (1)

191-191: Verify the image tag, we must!

Different from other jobs using tag '63', this one uses tag '74'. Intentional this change is?

✅ Verification successful

Valid and intentional, this image tag is!

Created recently, the image tag '74' has been. Tagged as 'latest', it is, hmmmm. Trust in the Force and the wisdom of the maintainers, we must.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the image exists and compare its creation date with other images
gh api \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /orgs/acts-project/packages/container/ubuntu2404_tensorrt/versions | jq '.[] | select(.metadata.container.tags[] | contains("74"))'

Length of output: 2322

.gitlab-ci.yml Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (1)

49-49: ⚠️ Potential issue

Return a proper torch::Device object, you must.

Return device type directly, you do. A proper torch::Device instance, needed it is.

Apply this diff to return the correct device:

-  torch::Device device() const override { return torch::kCUDA; };
+  torch::Device device() const override { return torch::Device(torch::kCUDA); };
🧹 Nitpick comments (1)
Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (1)

29-38: Document the configuration parameters, you must.

Mysterious, these magic numbers are. Documentation for cut = 0.21 and validation for deviceID, needed they are. Add comments explaining:

  • The meaning and impact of the cut threshold
  • Valid ranges for deviceID
  • Purpose of each configuration parameter
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6337bb and bc06d42.

📒 Files selected for processing (1)
  • Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: linux_examples_test
  • GitHub Check: linux_physmon
  • GitHub Check: linux_ubuntu_extra (ubuntu2204_clang, 20)
  • GitHub Check: macos
  • GitHub Check: build_debug
  • GitHub Check: CI Bridge / build_gnn_tensorrt
  • GitHub Check: missing_includes
🔇 Additional comments (2)
Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (2)

61-62: Wise use of mutex for thread safety, I sense.

Protected well, the execution contexts are. Thread-safe design, you have achieved.


18-23: Verify TensorRT headers in CI environment, we must.

Failed to compile in CI, the PR has. Check if TensorRT headers are available in CI environment, we should.

✅ Verification successful

Expected behavior in the Force, this is. Worry, you must not.

In specialized container with TensorRT, the CI runs. Missing headers in standard paths, natural this is. When building locally, set TensorRT_ROOT to your installation path, you must. In CI environment, the dedicated container ghcr.io/acts-project/ubuntu2404_tensorrt:74 handles this automatically, it does.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for TensorRT headers in system paths
find /usr/include -name "NvInfer.h" 2>/dev/null || echo "TensorRT headers not found in /usr/include"
find /usr/local/include -name "NvInfer.h" 2>/dev/null || echo "TensorRT headers not found in /usr/local/include"

# Check if TensorRT is mentioned in package configuration
if command -v pkg-config &> /dev/null; then
    pkg-config --list-all | grep -i tensorrt || echo "No TensorRT package config found"
fi

Length of output: 303


Script:

#!/bin/bash
# Check CMake configuration for TensorRT
fd CMakeLists.txt --exec grep -l "TensorRT" {} \;
fd "*.cmake" --exec grep -l "TensorRT" {} \;

# Check documentation for TensorRT requirements
fd "README|INSTALL" --extension md --exec grep -i -A 5 -B 5 "tensorrt" {} \;

# Check environment variable usage in build files
rg "TENSORRT|TensorRT" --type cmake

Length of output: 6096


Script:

#!/bin/bash
# Check build configuration options
fd "BuildOptions.cmake|CMakeCache.txt" --exec grep -A 5 -B 5 "EXATRKX.*TENSORRT" {} \;

# Look for any build documentation
rg -i "tensorrt|cuda" docs/ || echo "No direct mentions in docs/"

# Check CI configuration for TensorRT setup
fd "\.gitlab-ci\.yml|\.github/workflows" --exec grep -A 10 -B 10 -i "tensorrt" {} \;

Length of output: 2251

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1)

37-37: Const correctness in log method, improve we must.

The log method should mark msg parameter as const char* const, it should.

-  void log(Severity severity, const char *msg) noexcept override {
+  void log(Severity severity, const char* const msg) noexcept override {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91d9864 and a36cab6.

📒 Files selected for processing (2)
  • Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (1 hunks)
  • Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp
🧰 Additional context used
📓 Learnings (1)
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1)
Learnt from: benjaminhuth
PR: acts-project/acts#4016
File: Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp:44-46
Timestamp: 2025-01-21T10:16:57.820Z
Learning: In Acts project, when a derived class implements a virtual method, the documentation can be found in the base class and doesn't need to be repeated. For example, TensorRTEdgeClassifier's operator() documentation is in EdgeClassificationBase.
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: CI Bridge / build_gnn_tensorrt
  • GitHub Check: merge-sentinel
🔇 Additional comments (1)
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1)

181-183: ⚠️ Potential issue

Fix undefined variable and improve memory management, we must.

Variable 'outputMem' undefined it is, and memory management improved it must be.

-  auto scores = torch::from_blob(
-      outputMem, edgeIndex.size(1), 1, [](void *ptr) { cudaFree(ptr); },
-      torch::TensorOptions().device(torch::kCUDA).dtype(torch::kFloat32));
+  // scores tensor already created and used for inference
+  // no need to create new tensor from blob

Likely invalid or redundant comment.

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1)

197-203: Structured performance monitoring system, implement we should.

Instead of scattered timing measurements, a dedicated performance monitoring system, beneficial it would be. Consider these improvements:

  • Structured timing data collection
  • Memory usage tracking at key points
  • Performance metrics aggregation

Example implementation structure:

struct PerformanceMetrics {
    double deviceTransferTime;
    double inferenceTime;
    double postProcessingTime;
    size_t peakMemoryUsage;
    // Add more metrics as needed
};

// Add to class members
std::vector<PerformanceMetrics> m_metrics;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a36cab6 and 7243544.

📒 Files selected for processing (1)
  • Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1 hunks)
🧰 Additional context used
📓 Learnings (1)
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1)
Learnt from: benjaminhuth
PR: acts-project/acts#4016
File: Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp:44-46
Timestamp: 2025-01-21T10:16:57.820Z
Learning: In Acts project, when a derived class implements a virtual method, the documentation can be found in the base class and doesn't need to be repeated. For example, TensorRTEdgeClassifier's operator() documentation is in EdgeClassificationBase.
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: merge-sentinel
  • GitHub Check: CI Bridge / build_gnn_tensorrt
  • GitHub Check: macos
🔇 Additional comments (4)
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (4)

30-57: Well implemented, the logger class is!

Clean and proper mapping of TensorRT severity levels to Acts logging levels, I see. RAII principles with unique_ptr, you follow.


104-108: Check cudaMemGetInfo status, you must.

Handle errors from CUDA memory information retrieval, we should.


124-133: Validate input tensors before device transfer, we must.

Check tensor validity and dimensions before device transfer, essential it is.


139-145: Prevent deadlock in context acquisition, we must.

Infinite loop without timeout or condition variable, dangerous it is.

andiwand
andiwand previously approved these changes Jan 28, 2025
Copy link
Contributor

@andiwand andiwand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine to me. left a few minor comments and questions

.gitlab-ci.yml Show resolved Hide resolved
.gitlab-ci.yml Outdated Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
Co-authored-by: Andreas Stefl <stefl.andreas@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (2)

73-74: Validate model file size before allocation, you must!

Zero or extremely large file size, dangerous it could be. Add validation, we should.

   std::size_t fsize =
       std::filesystem::file_size(std::filesystem::path(m_cfg.modelPath));
+  if (fsize == 0 || fsize > 1024 * 1024 * 1024) {  // Max 1GB
+    throw std::runtime_error("Invalid model file size: " + std::to_string(fsize));
+  }

197-203: Structure timing data for monitoring, we should!

Debug logs alone, sufficient they are not. Structure timing data for metrics collection, we must.

+  struct TimingInfo {
+    double anycast_ms;
+    double setup_ms;
+    double inference_ms;
+    double postprocess_ms;
+  };
+  TimingInfo timing{
+    milliseconds(t0, t1),
+    milliseconds(t1, t2),
+    milliseconds(t2, t3),
+    milliseconds(t3, t4)
+  };
   ACTS_DEBUG("Time anycast:  " << timing.anycast_ms);
   ACTS_DEBUG("Time alloc, set shape " << timing.setup_ms);
   ACTS_DEBUG("Time inference:       " << timing.inference_ms);
   ACTS_DEBUG("Time sigmoid and cut: " << timing.postprocess_ms);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0e6dc9 and c2fd773.

📒 Files selected for processing (2)
  • .gitlab-ci.yml (1 hunks)
  • Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1 hunks)
🧰 Additional context used
📓 Learnings (1)
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1)
Learnt from: benjaminhuth
PR: acts-project/acts#4016
File: Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp:44-46
Timestamp: 2025-01-21T10:16:57.820Z
Learning: In Acts project, when a derived class implements a virtual method, the documentation can be found in the base class and doesn't need to be repeated. For example, TensorRTEdgeClassifier's operator() documentation is in EdgeClassificationBase.
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: merge-sentinel
  • GitHub Check: CI Bridge / build_gnn_tensorrt
  • GitHub Check: missing_includes
  • GitHub Check: build_debug
🔇 Additional comments (4)
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (3)

124-133: Input tensor validation, previously discussed it was!

Input validation suggestion from previous review, still valid it remains.


139-145: Deadlock prevention, discussed before it was!

Timeout implementation suggestion from previous review, still relevant it remains.


167-173: Proper cleanup after inference failure, discussed it was!

Cleanup implementation suggestion from previous review, still applicable it remains.

.gitlab-ci.yml (1)

189-219: Complete CI configuration, previously discussed it was!

Missing elements in CI configuration, flagged before they were. Previous suggestions for build command, testing stage, artifacts, and CUDA architectures, still valid they remain.

@kodiakhq kodiakhq bot merged commit cc7d6a9 into acts-project:main Jan 30, 2025
43 checks passed
@acts-project-service
Copy link
Collaborator

🔴 Athena integration test results [cc7d6a9]

Build job with this PR failed!

Please investigate the build job for the pipeline!

@acts-project-service acts-project-service added the Breaks Athena build This PR breaks the Athena build label Jan 30, 2025
paulgessinger pushed a commit to paulgessinger/acts that referenced this pull request Jan 30, 2025
Cannot be compiled currently in the CI
@paulgessinger paulgessinger modified the milestones: next, v39.0.0 Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaks Athena build This PR breaks the Athena build Component - Examples Affects the Examples module Component - Plugins Affects one or more Plugins Infrastructure Changes to build tools, continous integration, ...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants