fix(pt): optimize graph memory usage #4006

iProzd · 2024-07-23T07:44:44Z

Remove atomic virial graph.
Remove force graph during inference.

After this, the lammps memory saves 50% for dpa1 (attn_layer=0) and 80% for dpa2 (layer=12).

Summary by CodeRabbit

New Features
- Introduced a new inference parameter to key model functions, enhancing flexibility for inference scenarios during model execution.
- Added functionality to output a mapping array to a CSV file, improving data handling capabilities.
Bug Fixes
- Improved the behavior of the model during inference versus training, potentially impacting downstream processing based on the output.

coderabbitai · 2024-07-23T07:44:56Z

Walkthrough

The recent changes introduce a new inference parameter to several forward_lower and related functions across various model files. This enhancement allows the functions to explicitly handle inference scenarios, with most parameters set to default to True. The updates improve the flexibility of function calls while maintaining the existing logic flow.

Changes

Files	Change Summary
`deepmd/pt/model/model/dipole_model.py`, `deepmd/pt/model/model/dos_model.py`,	Added `inference=True` parameter to `forward_lower` functions, enhancing inference capabilities.
`deepmd/pt/model/model/dp_zbl_model.py`, `deepmd/pt/model/model/ener_model.py`,
`deepmd/pt/model/model/polar_model.py`
`deepmd/pt/model/model/make_model.py`	Introduced `inference=False` and `create_graph=self.training` parameters to `forward_common_lower`, distinguishing inference modes.
`deepmd/pt/model/model/spin_model.py`	Added `inference` parameters to both `forward_lower` (default `True`) and `forward_common_lower` (default `False`).
`deepmd/pt/model/model/transform_output.py`	Added `inference=False` parameter to multiple functions for flexibility in gradient computations.
`deepmd/pt/entrypoints/main.py`	Modified the `freeze` function to set the model to evaluation mode before scripting with Torch.
`source/lmp/pair_deepmd.cpp`	Enhanced `PairDeepMD` class with new mapping functionality for atom tags and CSV output.

Possibly related issues

[BUG] _lmp raise "assert mapping is not None" with dpa2 model_ #3428: The changes may help address issues related to inference handling that could lead to assertions failing when the model is loaded and run.
[BUG] _Error with Pytorch models #3738: The introduction of the inference parameter and related modifications might improve model compatibility and prevent assertion errors during computations.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (8)

deepmd/pt/model/model/dp_zbl_model.py (1)
114-115: Ensure all calls to forward_lower include the inference parameter.

The forward_lower function in deepmd/pt/model/model/dp_zbl_model.py has been updated to include the inference parameter. However, many calls to this function in the codebase do not include this parameter, which could lead to runtime errors or unexpected behavior.

Files and lines to update:

source/tests/pt/model/test_dp_model.py

Lines: 0, 1, 2

source/tests/pt/model/test_forward_lower.py

Lines: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

source/tests/pt/model/test_ener_spin_model.py

Lines: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

source/tests/universal/common/cases/model/utils.py

Lines: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

source/tests/universal/pt/backend.py

Lines: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

Please update these calls to include the inference parameter to ensure consistency and correct functionality.

Analysis chain

Verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to forward_lower match the new signature.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `forward_lower` match the new signature.

# Test: Search for the function usage. Expect: Only occurances of the new signature.
rg --type python -A 5 $'forward_lower'
Length of output: 68

Script:
#!/bin/bash
# Description: Verify all function calls to `forward_lower` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type py -A 5 $'forward_lower'
Length of output: 8436

Script:
#!/bin/bash
# Description: Verify the function definition of `forward_lower` in `deepmd/pt/model/model/dp_zbl_model.py` includes the `inference` parameter
# and check if all calls to this function in the codebase include this parameter.

# Test: Search for the function definition of `forward_lower` in the specified file.
ast-grep --lang python --pattern $'def forward_lower($$$):\n    $$$' deepmd/pt/model/model/dp_zbl_model.py

# Test: Search for all calls to `forward_lower` in the codebase and check if they include the `inference` parameter.
rg --type py 'forward_lower' -A 5
Length of output: 11331
deepmd/pt/model/model/transform_output.py (3)
125-126: Update function call to match new signature

The function call to take_deriv in deepmd/pt/model/model/transform_output.py does not include the new inference parameter. Please update the function call to match the new signature.
deepmd/pt/model/model/transform_output.py:
dr, dc = take_deriv(
    vv,
    model_ret[kk_redu],
    vdef,
    coord_ext,
    do_virial=vdef.c_differentiable,
    inference=False  # Add this line
)
Analysis chain

Verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to take_deriv match the new signature.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `take_deriv` match the new signature.

# Test: Search for the function usage. Expect: Only occurances of the new signature.
rg --type python -A 5 $'take_deriv'
Length of output: 65

Script:
#!/bin/bash
# Description: Verify all function calls to `take_deriv` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type py -A 5 $'take_deriv'
Length of output: 896
167-168: Update function calls to match new signature

The function fit_output_to_model_output has a new parameter inference set to False by default. However, the following function calls do not include this new parameter and need to be updated:

deepmd/pt/model/model/make_model.py

deepmd/dpmodel/model/make_model.py

Please update these function calls to include the inference parameter to match the new signature.

Analysis chain

Verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to fit_output_to_model_output match the new signature.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `fit_output_to_model_output` match the new signature.

# Test: Search for the function usage. Expect: Only occurances of the new signature.
rg --type python -A 5 $'fit_output_to_model_output'
Length of output: 81

Script:
#!/bin/bash
# Description: Verify all function calls to `fit_output_to_model_output` match the new signature.

# Search for the function definition to understand its context.
ast-grep --lang python --pattern $'def fit_output_to_model_output($_, $_, $_, $_, $_, $_, $_, $_, $_, inference=False):\n    $$$'

# Search for the function usage in the codebase.
rg 'fit_output_to_model_output'
Length of output: 626

Script:
#!/bin/bash
# Description: Verify all function calls to `fit_output_to_model_output` match the new signature.

# Check the function calls in the identified files.
rg 'fit_output_to_model_output' deepmd/pt/model/model/make_model.py -A 3
rg 'fit_output_to_model_output' deepmd/dpmodel/model/make_model.py -A 3
Length of output: 590
75-76: Update all calls to task_deriv_one to match the new signature.

The task_deriv_one function has been updated to include a new inference parameter, but not all calls to this function reflect this change. Ensure that all calls to task_deriv_one include the inference parameter to maintain consistency and avoid potential issues.

deepmd/pt/model/model/transform_output.py: Update the call to task_deriv_one to include the inference parameter.

Analysis chain

Verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to task_deriv_one match the new signature.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `task_deriv_one` match the new signature.

# Test: Search for the function usage. Expect: Only occurances of the new signature.
rg --type python -A 5 $'task_deriv_one'
Length of output: 69

Script:
#!/bin/bash
# Description: Verify all function calls to `task_deriv_one` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg 'task_deriv_one' --glob '*.py' -A 5
Length of output: 895

Script:
#!/bin/bash
# Description: Verify all function calls and definitions of `task_deriv_one` match the new signature.

# Test: Search for the function definition and usage. Expect: Only occurrences of the new signature.
rg 'def task_deriv_one' --glob '*.py' -A 5
rg 'task_deriv_one(' --glob '*.py' -A 5
Length of output: 594

Script:
#!/bin/bash
# Description: Verify all function calls to `task_deriv_one` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg 'task_deriv_one\(' --glob '*.py' -A 5
Length of output: 899
deepmd/pt/model/model/make_model.py (2)
243-244: Update the docstring for clarity.

The docstring should be updated to match the new parameter name and its purpose.
- inference
-     Whether only perform inference rather than undergoing training.
+ inference
+     Whether to perform inference only, rather than undergoing training.
218-218: Ensure all calls to forward_common_lower include the new inference parameter.

The following function calls to forward_common_lower need to be updated to match the new signature:

source/tests/pt/model/test_ener_spin_model.py

source/tests/pt/model/test_dp_model.py

deepmd/pt/model/model/spin_model.py

deepmd/pt/model/model/polar_model.py

deepmd/pt/model/model/make_model.py

deepmd/pt/model/model/ener_model.py

deepmd/pt/model/model/dp_zbl_model.py

deepmd/pt/model/model/dos_model.py

deepmd/pt/model/model/dipole_model.py

Analysis chain

LGTM! But verify the function usage in the codebase.

The addition of the inference parameter is approved.

However, ensure that all function calls to forward_common_lower match the new signature.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `forward_common_lower` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'forward_common_lower'
Length of output: 75

Script:
#!/bin/bash
# Description: Verify all function calls to `forward_common_lower` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg -A 5 'forward_common_lower'
Length of output: 8451
deepmd/pt/model/model/spin_model.py (2)
608-608: Verify and update forward_lower function calls to match the new signature.

The addition of the inference parameter to forward_lower is approved. However, some function calls do not include this parameter. Please update the following instances to ensure they match the new signature:

source/tests/universal/common/cases/model/utils.py

Line: ret_lower.append(module.forward_lower(**input_dict_lower))

source/tests/pt/model/test_dp_model.py

Line: ret2 = md0.forward_lower(*args, do_atomic_virial=True)

Line: ret0 = md0.forward_lower(*args)

Line: ret1 = md1.forward_lower(*args)

Line: ret0 = md0.forward_lower(*args, do_atomic_virial=True)

Line: ret1 = md1.forward_lower(*args, do_atomic_virial=True)

source/api_cc/src/DeepPotPT.cc

Line: .run_method("forward_lower", coord_wrapped_Tensor, atype_Tensor, firstneigh_tensor, optional_tensor, fparam_tensor, aparam_tensor, do_atom_virial_tensor, comm_dict)

Line: .run_method("forward_lower", coord_wrapped_Tensor, atype_Tensor, firstneigh_tensor, optional_tensor, fparam_tensor, aparam_tensor, do_atom_virial_tensor)

Please ensure that all calls to forward_lower include the inference parameter set to True.

Analysis chain

LGTM! But verify the function usage in the codebase.

The addition of the inference parameter is approved.

However, ensure that all function calls to forward_lower match the new signature.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `forward_lower` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'forward_lower'
Length of output: 68

Script:
#!/bin/bash
# Description: Verify all function calls to `forward_lower` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg -A 5 'forward_lower'
Length of output: 9418
462-462: Ensure all calls to forward_common_lower include the inference parameter.

The inference parameter is not found in the function calls to forward_common_lower in the following files:

source/tests/pt/model/test_ener_spin_model.py

source/tests/pt/model/test_dp_model.py

deepmd/pt/model/model/spin_model.py

deepmd/pt/model/model/polar_model.py

deepmd/pt/model/model/make_model.py

deepmd/pt/model/model/ener_model.py

deepmd/pt/model/model/dp_zbl_model.py

deepmd/pt/model/model/dos_model.py

deepmd/pt/model/model/dipole_model.py

Please update these calls to match the new function signature.

Analysis chain

LGTM! But verify the function usage in the codebase.

The addition of the inference parameter is approved.

However, ensure that all function calls to forward_common_lower match the new signature.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `forward_common_lower` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'forward_common_lower'
Length of output: 75

Script:
#!/bin/bash
# Description: Verify all function calls to `forward_common_lower` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type py -A 5 $'forward_common_lower'
Length of output: 8461

Script:
#!/bin/bash
# Description: Check if the `inference` parameter is included in the function calls to `forward_common_lower`.

# Test: Search for the function calls and check for the `inference` parameter.
rg --type py 'forward_common_lower' -A 5 | grep -B 5 'inference'
Length of output: 62

codecov · 2024-07-23T08:02:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.85%. Comparing base (a6ea2c1) to head (0f44a60).
Report is 106 commits behind head on devel.

Additional details and impacted files

@@           Coverage Diff           @@
##            devel    #4006   +/-   ##
=======================================
  Coverage   82.84%   82.85%           
=======================================
  Files         522      522           
  Lines       50920    50922    +2     
  Branches     3015     3015           
=======================================
+ Hits        42186    42189    +3     
  Misses       7796     7796           
+ Partials      938      937    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

deepmd/pt/model/model/make_model.py

deepmd/pt/model/model/transform_output.py

- Remove atomic virial graph. - Remove force graph during inference. After this, the lammps memory saves **50% for dpa1** (attn_layer=0) and **80% for dpa2** (layer=12).  ## Summary by CodeRabbit - **New Features** - Introduced a new `inference` parameter to key model functions, enhancing flexibility for inference scenarios during model execution. - Added functionality to output a mapping array to a CSV file, improving data handling capabilities. - **Bug Fixes** - Improved the behavior of the model during inference versus training, potentially impacting downstream processing based on the output.

fix(pt): optimize graph memory usage

e4c6f8f

iProzd requested review from njzjz and wanghan-iapcm July 23, 2024 07:44

github-actions bot added the Python label Jul 23, 2024

iProzd requested a review from CaRoLZhangxy July 23, 2024 07:48

Merge branch 'devel' into fix_create_graph

b6e4878

coderabbitai bot reviewed Jul 23, 2024

View reviewed changes

njzjz requested changes Jul 23, 2024

View reviewed changes

deepmd/pt/model/model/make_model.py Outdated Show resolved Hide resolved

deepmd/pt/model/model/transform_output.py Show resolved Hide resolved

deepmd/pt/model/model/transform_output.py Show resolved Hide resolved

deepmd/pt/model/model/transform_output.py Outdated Show resolved Hide resolved

use self.training

9942583

github-actions bot added the LAMMPS label Jul 23, 2024

Update pair_deepmd.cpp

0f44a60

iProzd requested a review from njzjz July 23, 2024 09:32

wanghan-iapcm approved these changes Jul 23, 2024

View reviewed changes

njzjz approved these changes Jul 23, 2024

View reviewed changes

njzjz linked an issue Jul 23, 2024 that may be closed by this pull request

[BUG] CUDA out of memory, when only 1600 atoms, using the pytorch model with spin #3969

Closed

iProzd added this pull request to the merge queue Jul 24, 2024

Merged via the queue into deepmodeling:devel with commit 7f9300d Jul 24, 2024
60 checks passed

iProzd deleted the fix_create_graph branch July 24, 2024 08:51

iProzd added a commit to iProzd/deepmd-kit that referenced this pull request Jul 25, 2024

fix(2024Q1): optimize graph memory (copy deepmodeling#4006)

c07a56f

iProzd added a commit that referenced this pull request Jul 25, 2024

fix(2024Q1): optimize graph memory (copy #4006) (#4020)

c09a1f7

njzjz mentioned this pull request Jul 26, 2024

[BUG] CUDA out of memory, when only 1600 atoms, using the pytorch model with spin #3969

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pt): optimize graph memory usage #4006

fix(pt): optimize graph memory usage #4006

iProzd commented Jul 23, 2024 •

edited

Loading

coderabbitai bot commented Jul 23, 2024 •

edited

Loading

Walkthrough

Changes

Possibly related issues

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

codecov bot commented Jul 23, 2024 •

edited

Loading

fix(pt): optimize graph memory usage #4006

fix(pt): optimize graph memory usage #4006

Conversation

iProzd commented Jul 23, 2024 • edited Loading

Summary by CodeRabbit

coderabbitai bot commented Jul 23, 2024 • edited Loading

Walkthrough

Changes

Possibly related issues

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 23, 2024 • edited Loading

Codecov Report

iProzd commented Jul 23, 2024 •

edited

Loading

coderabbitai bot commented Jul 23, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Jul 23, 2024 •

edited

Loading