-
Notifications
You must be signed in to change notification settings - Fork 575
pd: support dpa3 dynamic shape for pd backend #4828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pd: support dpa3 dynamic shape for pd backend #4828
Conversation
1a5fe62 to
dd70e99
Compare
📝 Walkthrough## Walkthrough
This change standardizes tensor dimension handling and indexing conventions across several modules, primarily affecting how edge and angle indices are structured and used. It introduces explicit reshaping, modifies the shape and construction of index tensors, and clarifies broadcasting logic, without altering core algorithms or public interfaces.
## Changes
| File(s) | Change Summary |
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| deepmd/pd/model/descriptor/repflow_layer.py | Standardized tensor dimension handling and reshaping; changed edge/angle index slicing from column to row-based; clarified broadcasting; minor formatting improvements. |
| deepmd/pd/model/descriptor/repflows.py | Changed initialization of edge/angle index tensors to new shapes; updated owner argument to match new indexing; conditional node embedding extraction based on local mapping. |
| deepmd/pd/model/network/utils.py | Modified `aggregate` to compute `bin_count` only when needed; changed output tensor initialization; added assertion; updated `get_graph_index` to stack indices along axis 0 and added `use_loc_mapping` parameter. |
| source/tests/pd/model/test_dynamic_sel.py | Added test class validating descriptor consistency between dynamic selection enabled and disabled modes with multiple parameter combinations and precision settings. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant User
participant RepFlowLayer
participant Utils
User->>RepFlowLayer: forward(nlist, nlist_mask, ...)
RepFlowLayer->>Utils: get_graph_index(nlist, nlist_mask, ..., use_loc_mapping)
Utils-->>RepFlowLayer: edge_index [2, n_edge], angle_index [3, n_angle]
RepFlowLayer->>RepFlowLayer: _cal_hg_dynamic(..., owner=edge_index[0], ...)
RepFlowLayer->>Utils: aggregate(data, owners, average, num_owner)
Utils-->>RepFlowLayer: aggregated tensor
RepFlowLayer-->>User: outputPossibly related PRs
Suggested reviewers
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
deepmd/pd/model/descriptor/repflow_layer.py (1)
752-752: Remove unused variable assignment.The variable
nallis assigned but never used in this scope. Consider removing this assignment to clean up the code.- nall = node_ebd_ext.shape[1]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
deepmd/pd/model/descriptor/repflow_layer.py(9 hunks)deepmd/pd/model/descriptor/repflows.py(2 hunks)deepmd/pd/model/network/utils.py(5 hunks)
🧰 Additional context used
🧠 Learnings (2)
deepmd/pd/model/descriptor/repflows.py (2)
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4160
File: deepmd/dpmodel/utils/env_mat.py:52-64
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Negative indices in `nlist` are properly handled by masking later in the computation, so they do not cause issues in indexing operations.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4160
File: deepmd/dpmodel/utils/env_mat.py:52-64
Timestamp: 2024-09-24T01:59:37.973Z
Learning: Negative indices in `nlist` are properly handled by masking later in the computation, so they do not cause issues in indexing operations.
deepmd/pd/model/network/utils.py (2)
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4160
File: deepmd/dpmodel/utils/env_mat.py:52-64
Timestamp: 2024-09-24T01:59:37.973Z
Learning: Negative indices in `nlist` are properly handled by masking later in the computation, so they do not cause issues in indexing operations.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4160
File: deepmd/dpmodel/utils/env_mat.py:52-64
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Negative indices in `nlist` are properly handled by masking later in the computation, so they do not cause issues in indexing operations.
🧬 Code Graph Analysis (1)
deepmd/pd/model/network/utils.py (1)
source/tests/consistent/descriptor/test_dpa3.py (1)
data(78-134)
🪛 Ruff (0.11.9)
deepmd/pd/model/descriptor/repflow_layer.py
752-752: Local variable nall is assigned to but never used
Remove assignment to unused variable nall
(F841)
🪛 Flake8 (7.2.0)
deepmd/pd/model/descriptor/repflow_layer.py
[error] 752-752: local variable 'nall' is assigned to but never used
(F841)
⏰ Context from checks skipped due to timeout of 90000ms (21)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (false)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test C++ (true)
- GitHub Check: Analyze (c-cpp)
🔇 Additional comments (13)
deepmd/pd/model/descriptor/repflows.py (2)
518-519: LGTM: Tensor shape initialization updated to match new indexing conventions.The initialization of
edge_indexandangle_indexhas been correctly updated from shapes[1, 3]to[2, 1]and[3, 1]respectively, which aligns with the new tensor dimension conventions where these tensors are structured as[2, n_edge]and[3, n_angle].
570-570: LGTM: Indexing updated to match new tensor shape convention.The change from
edge_index[:, 0](column-wise indexing) toedge_index[0](row-wise indexing) is consistent with the new tensor shape whereedge_indexhas dimensions[2, n_edge]instead of[n_edge, 2].deepmd/pd/model/network/utils.py (5)
32-43: LGTM: Optimized bin_count computation for better performance.This optimization computes
bin_countonly when needed (num_ownerisNoneor averaging is requested), which can improve performance in cases where bincount computation is expensive and unnecessary.
46-50: LGTM: Improved tensor initialization and assertion for safety.The output tensor initialization now consistently uses
num_ownerfor the first dimension, and the assertion ensuresbin_countis notNonebefore division when averaging is requested, preventing potential runtime errors.
59-59: LGTM: Added use_loc_mapping parameter for flexible index calculation.The new parameter provides control over how
frame_shiftis computed, allowing the function to work with different indexing schemes based on whether local mapping is used.
109-111: LGTM: Frame shift calculation adapted for different mapping modes.The conditional logic correctly adjusts the frame shift calculation based on
use_loc_mapping, using eithernallornlocas the multiplier, which maintains proper indexing behavior across different execution modes.
140-143: LGTM: Tensor stacking changes align with new indexing conventions.The change from concatenation to stacking transforms the output tensor shapes from
[n_edge, 2]and[n_angle, 3]to[2, n_edge]and[3, n_angle]respectively, which standardizes the indexing convention across the codebase.deepmd/pd/model/descriptor/repflow_layer.py (6)
375-377: LGTM: Improved code clarity with explicit tensor operations.The change from ellipsis-based indexing to explicit
.unsqueeze()calls makes the tensor operations more explicit and easier to understand, improving code readability.
588-592: LGTM: Standardized tensor reshaping with list arguments.Using list arguments for
reshape()instead of tuple unpacking is more explicit and consistent with modern tensor operation practices.
703-704: LGTM: Documentation updated to reflect new tensor shapes.The function signature documentation correctly reflects the new tensor shapes
2 x n_edgeand3 x n_angle, maintaining consistency with the implementation changes.
764-769: LGTM: Updated tensor indexing to match new shape conventions.The change from column-wise indexing (e.g.,
edge_index[:, 0]) to row-wise indexing (e.g.,edge_index[0]) correctly adapts to the new tensor shapes whereedge_indexhas dimensions[2, n_edge]andangle_indexhas dimensions[3, n_angle].
900-900: LGTM: Consistent use of explicit reshaping.The change to use list arguments in
reshape()maintains consistency with the other tensor operations in the codebase.
1046-1049: LGTM: Explicit tensor operations improve readability.The use of explicit
.unsqueeze()calls instead of implicit broadcasting makes the tensor operations more transparent and easier to debug.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #4828 +/- ##
==========================================
+ Coverage 84.57% 84.71% +0.14%
==========================================
Files 699 699
Lines 68070 68074 +4
Branches 3541 3541
==========================================
+ Hits 57567 57666 +99
+ Misses 9369 9273 -96
- Partials 1134 1135 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
deepmd/pd/model/descriptor/repflow_layer.py (1)
768-768: Remove unused variable assignment.The variable
nallis assigned but never used in the function. This creates unnecessary overhead and can be confusing.Apply this diff to remove the unused variable:
- nall = node_ebd_ext.shape[1]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
deepmd/pd/model/descriptor/repflow_layer.py(9 hunks)deepmd/pd/model/descriptor/repflows.py(3 hunks)deepmd/pd/model/network/utils.py(5 hunks)source/tests/pd/model/test_dynamic_sel.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- deepmd/pd/model/descriptor/repflows.py
- deepmd/pd/model/network/utils.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-09-19T04:25:12.408Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
source/tests/pd/model/test_dynamic_sel.py (2)
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
PR: deepmodeling/deepmd-kit#4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-09-19T04:25:12.408Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
🪛 Ruff (0.11.9)
deepmd/pd/model/descriptor/repflow_layer.py
768-768: Local variable nall is assigned to but never used
Remove assignment to unused variable nall
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test C++ (true)
- GitHub Check: Test C++ (false)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
🔇 Additional comments (12)
source/tests/pd/model/test_dynamic_sel.py (1)
1-156: Well-structured test for dynamic selection consistency validation.The test comprehensively validates that enabling dynamic selection (
use_dynamic_sel=True) produces numerically equivalent results to the standard implementation. The parameter combinations cover key configuration options, and the tolerance settings are appropriately configured for different precision types.The test methodology is sound:
- Creates two identical descriptor instances with different dynamic selection settings
- Uses the same random seed for reproducibility
- Compares outputs using appropriate numerical tolerances
- Tests multiple precision types and parameter combinations
deepmd/pd/model/descriptor/repflow_layer.py (11)
391-391: Improved tensor broadcasting clarity.The change from ellipsis-based indexing to explicit
.unsqueeze()calls improves code readability and makes the broadcasting operation more explicit and clear.
605-608: Improved tensor reshaping with explicit list argument.The change to use a list argument
[nf * nloc, sub_node_update.shape[-1]]instead of implicit tuple unpacking improves code clarity and follows best practices for tensor reshaping operations.
687-690: Consistent tensor reshaping improvement.Good consistency with the previous change, using explicit list arguments for reshape operations.
696-699: Consistent tensor reshaping pattern.Maintains the same improved pattern of using explicit list arguments for tensor reshaping operations.
719-720: Updated parameter documentation to reflect new tensor shapes.The documentation correctly reflects the change from column-wise indexing
(n_edge x 2)to row-wise indexing(2 x n_edge)foredge_indexand(3 x n_angle)forangle_index.
745-756: Comprehensive documentation update for new indexing convention.The parameter documentation has been thoroughly updated to reflect the new tensor shapes and indexing patterns. This maintains consistency with the implementation changes.
767-767: Extract shape information for dynamic selection logic.The unpacking of
nb, nloc, nneifromnlist.shapeprovides the necessary dimensions for the dynamic selection logic that follows.
775-777: Proper n_edge computation for dynamic selection.The logic correctly sets
n_edge = Nonewhen dynamic selection is disabled andn_edge = h2.shape[0]when enabled. This aligns with the different data structures used in each mode.
780-785: Updated indexing to match new tensor shapes.The changes from
edge_index[:, 0]toedge_index[0]and similar updates forangle_indexcorrectly reflect the new tensor shapes where the first dimension represents the index type rather than the second dimension.
916-916: Consistent reshape operation improvement.The change to use a list argument in the reshape operation maintains consistency with the earlier improvements in the file.
1062-1065: Explicit tensor operations for switch function application.The replacement of implicit broadcasting with explicit
.unsqueeze()calls makes the tensor operations more readable and explicit, improving code maintainability.
support running
input_torch_dynamic.jsonwith paddle backend(including CINN)TODO list:
masked_select_double_gradin eager mode PaddlePaddle/Paddle#73601masked_select_gradfor dynamic shape PaddlePaddle/Paddle#73622index_add_gradfor static decomposition PaddlePaddle/Paddle#73737index_put_gradfor static decomposition PaddlePaddle/Paddle#73747Summary by CodeRabbit
Summary by CodeRabbit
Bug Fixes
Refactor
New Features
Tests