-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add documentation for installation requirements of DPA-2 #4178
Conversation
Fix deepmodeling#4161. Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
📝 WalkthroughWalkthroughThe changes involve modifications to the Changes
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
doc/model/dpa2.md (2)
11-19
: Excellent addition of installation requirements!The new section on installation requirements is a valuable addition to the documentation. It directly addresses the issues mentioned in the PR objectives, particularly the NotImplementedError related to the customized PyTorch OP library. The information is clear, concise, and provides actionable instructions for users.
Consider adding information about the following environment variables mentioned in the related issue:
OMP_NUM_THREADS
DP_INTRA_OP_PARALLELISM_THREADS
DP_INTER_OP_PARALLELISM_THREADS
This additional information could help users avoid potential issues related to these variables.
11-19
: Overall, excellent documentation update!This addition to the documentation effectively addresses the installation requirements for the DPA-2 model, which was the primary objective of this PR. The new section provides clear instructions and recommendations that should help users avoid the errors encountered in issue #4161.
To further enhance the documentation:
- Consider adding a troubleshooting section that addresses common errors, including those mentioned in the issue (e.g., missing
librdmacm.so.1
, TensorFlow and PyTorch related errors).- It might be helpful to provide a brief explanation of why the customized OP library is necessary, to give users a better understanding of its importance.
- If possible, include version compatibility information (e.g., which versions of DeepMD-kit, PyTorch, and LAMMPS are known to work together).
These additions would make the documentation even more robust and user-friendly.
deepmd/pt/model/descriptor/repformers.py (3)
Line range hint
186-186
: Document theold_impl
parameter in the constructor's docstring.The parameter
old_impl
is missing from the docstring of the__init__
method in theDescrptBlockRepformers
class. Please add documentation for this parameter to explain its purpose and usage.Apply this diff to add the missing documentation:
ln_eps : float, optional The epsilon value for layer normalization. seed : int, optional Random seed for parameter initialization. + old_impl : bool, optional + Whether to use the old implementation of `RepformerLayer`. use_sqrt_nnei : bool, optional Whether to use the square root of the number of neighbors for symmetrization_op normalization instead of using the number of neighbors directly.
Line range hint
290-298
: Add documentation for thecomm_dict
parameter inforward
method.The
comm_dict
parameter has been introduced to handle communication in distributed settings. To improve clarity, please document this parameter in the method's docstring, including its expected structure and usage.Consider adding the following to the
forward
method's docstring:): + r""" + Forward pass of the descriptor block. + + Parameters + ---------- + nlist : torch.Tensor + Neighbor list tensor. + extended_coord : torch.Tensor + Extended coordinates tensor. + extended_atype : torch.Tensor + Extended atomic types tensor. + extended_atype_embd : Optional[torch.Tensor], optional + Extended atomic type embeddings, by default None. + mapping : Optional[torch.Tensor], optional + Mapping tensor for gathering data, by default None. + comm_dict : Optional[Dict[str, torch.Tensor]], optional + Communication dictionary containing tensors for distributed computation, by default None. + + """
Line range hint
366-376
: Handle potential exceptions when accessingcomm_dict
keys.The code assumes that certain keys exist in
comm_dict
. While assertions are used, consider adding exception handling or more descriptive error messages to aid debugging if a key is missing.For example, you might replace assertions with explicit error messages:
- assert "send_list" in comm_dict + if "send_list" not in comm_dict: + raise KeyError("comm_dict is missing the 'send_list' key.")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- deepmd/pt/model/descriptor/repformers.py (1 hunks)
- doc/model/dpa2.md (1 hunks)
🔇 Additional comments (3)
deepmd/pt/model/descriptor/repformers.py (3)
63-64
: Enhanced error message inborder_op
function.The updated error message provides additional context by directing users to the DPA-2 documentation. This improvement helps users understand the reason behind the
NotImplementedError
and how to resolve it.
Line range hint
197-232
: Conditional instantiation of Repformer layers based onold_impl
.The code correctly adds support for using either the new
RepformerLayer
or the old implementationRepformerLayerOld
based on theold_impl
flag. This enhancement increases flexibility, allowing users to choose between implementations as needed.
Line range hint
351-376
: Ensure correct handling of tensor dimensions in communication operations.When
comm_dict
is notNone
, the code performs padding ong1
and usestorch.ops.deepmd.border_op
. Please verify that tensor dimensions are correctly managed and that padding aligns with the expected shapes, especially considering the squeeze and unsqueeze operations.Run the following script to check tensor dimensions after padding and the output of
border_op
:
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## devel #4178 +/- ##
=======================================
Coverage 83.46% 83.46%
=======================================
Files 537 537
Lines 52168 52168
Branches 3046 3046
=======================================
Hits 43543 43543
Misses 7678 7678
Partials 947 947 ☔ View full report in Codecov by Sentry. |
Fix #4161.
Summary by CodeRabbit
New Features
Improvements
border_op
function for better user guidance.DescrptBlockRepformers
class.forward
method.compute_input_stats
method.