-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pt): add universal test for loss #4354
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThe pull request introduces modifications to various files, primarily focusing on changes to precision handling in the Changes
Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (14)
source/tests/universal/pt/loss/test_loss.py (1)
30-38
: Consider improving readability of the parameterization.While the parameterization is well-structured, consider extracting the parameter mapping to a separate function or using a more descriptive variable name for better readability.
+def create_loss_parameters(): + return ( + *[(param_func, EnergyStdLoss) for param_func in LossParamEnergyList], + *[(param_func, EnergySpinLoss) for param_func in LossParamEnergySpinList], + *[(param_func, DOSLoss) for param_func in LossParamDosList], + *[(param_func, TensorLoss) for param_func in LossParamTensorList], + *[(param_func, PropertyLoss) for param_func in LossParamPropertyList], + ) -@parameterized( - ( - *[(param_func, EnergyStdLoss) for param_func in LossParamEnergyList], - *[(param_func, EnergySpinLoss) for param_func in LossParamEnergySpinList], - *[(param_func, DOSLoss) for param_func in LossParamDosList], - *[(param_func, TensorLoss) for param_func in LossParamTensorList], - *[(param_func, PropertyLoss) for param_func in LossParamPropertyList], - ) -) +@parameterized(create_loss_parameters())source/tests/universal/common/cases/loss/utils.py (3)
9-11
: Consider simplifying the deep relative import pathThe deep relative import path
from .....seed import GLOBAL_SEED
could be made more maintainable by using an absolute import path.
30-60
: Make hardcoded values configurable and document special casesThe method uses hardcoded values and includes special case handling that should be documented and made configurable.
Consider:
- Making
natoms
configurable through a class attribute- Documenting the special case handling of 'atom_ener' to 'atom_energy' renaming
- Adding type hints and docstring
+ @property + def default_natoms(self) -> int: + """Number of atoms to use in tests. Override if needed.""" + return 5 - def test_forward(self): + def test_forward(self) -> None: + """Test the forward pass of the loss module. + + Tests that the module correctly processes input data and returns + expected loss values. Handles special cases like renaming + 'atom_ener' to 'atom_energy' for compatibility. + """ module = self.forward_wrapper(self.module) label_requirement = self.module.label_requirement label_dict = {item.key: item for item in label_requirement} label_keys = sorted(label_dict.keys()) - natoms = 5 + natoms = self.default_natoms
62-75
: Add comprehensive docstring to utility functionThe function would benefit from detailed documentation explaining its parameters and return value.
Consider adding:
def fake_input_one_frame(data_item: DataRequirementItem, natoms=5) -> np.ndarray: + """Generate random test data based on DataRequirementItem properties. + + Args: + data_item: Specification of the data requirements including ndof, + atomic flag, and repeat count + natoms: Number of atoms to generate data for (default: 5) + + Returns: + np.ndarray: Random data matching the specifications with shape: + - [1, natoms, ndof] if atomic=True + - [1, ndof] if atomic=False + Repeated according to repeat count if specified. + """source/tests/universal/pt/backend.py (3)
86-87
: Consider adding error handling for invalid input types.The conversion logic looks good, but consider adding error handling for unsupported input types to provide better error messages during testing.
def forward_wrapper(self, module, on_cpu=False): def create_wrapper_method(method): def wrapper_method(self, *args, **kwargs): # convert to torch tensor - args = [_to_torch_tensor(arg) for arg in args] - kwargs = {k: _to_torch_tensor(v) for k, v in kwargs.items()} + try: + args = [_to_torch_tensor(arg) for arg in args] + kwargs = {k: _to_torch_tensor(v) for k, v in kwargs.items()} + except Exception as e: + raise ValueError(f"Failed to convert inputs to torch tensors: {e}")Also applies to: 100-104
117-124
: Add type hints and docstrings to improve maintainability.The utility functions would benefit from type hints and docstrings to improve code maintainability and IDE support.
-def _to_torch_tensor(xx): +def _to_torch_tensor(xx: Any) -> Union[torch.Tensor, Dict[str, torch.Tensor], Callable]: + """Convert input to PyTorch tensor, handling dictionaries and callables. + + Args: + xx: Input to convert, can be tensor, array, dictionary, or callable + + Returns: + Converted tensor, dictionary of tensors, or wrapped callable + """ if isinstance(xx, dict):Also applies to: 134-140
117-148
: Consider reducing code duplication in conversion functions.The torch and numpy conversion functions are nearly identical. Consider refactoring to reduce duplication using a factory function or class.
+def _create_converter(to_func, convert_callable_func): + def converter(xx): + if isinstance(xx, dict): + return {kk: to_func(xx[kk]) for kk in xx} + elif callable(xx): + return convert_callable_func(xx) + else: + return to_func(xx) + return converter + +def _create_callable_converter(converter_func): + def convert(func): + def wrapper(*args, **kwargs): + return converter_func(func(*args, **kwargs)) + return wrapper + return convert + +_to_torch_tensor = _create_converter( + to_torch_tensor, + _create_callable_converter(_to_torch_tensor) +) + +_to_numpy_array = _create_converter( + to_numpy_array, + _create_callable_converter(_to_numpy_array) +)source/tests/universal/dpmodel/loss/test_loss.py (5)
6-8
: Consider restructuring the package importsThe deep relative import path (
....consistent.common
) suggests a nested package structure that might benefit from reorganization for better maintainability.Consider moving commonly used utilities like
parameterize_func
to a more accessible location or creating a dedicated testing utilities package.
11-37
: Consider parameterizing the limit ratioThe limit preferences are hardcoded to half of the start preferences (e.g.,
limit_pref_e = pref_e / 2
). This might limit the flexibility of tests.Consider adding a
limit_ratio
parameter to make the relationship between start and limit preferences configurable:def LossParamEnergy( starter_learning_rate=1.0, pref_e=1.0, pref_f=1.0, pref_v=1.0, pref_ae=1.0, + limit_ratio=0.5, ): # ... input_dict = { "key_to_pref_map": key_to_pref_map, "starter_learning_rate": starter_learning_rate, "start_pref_e": pref_e, - "limit_pref_e": pref_e / 2, + "limit_pref_e": pref_e * limit_ratio, # ... similar changes for other limit preferences }
102-124
: Parameterize hardcoded valuesThe function contains hardcoded values that might need to be configurable for different test scenarios:
numb_dos
is set to 2- CDF preferences are set to 0.0
Consider making these values configurable:
def LossParamDos( starter_learning_rate=1.0, pref_dos=1.0, pref_ados=1.0, + numb_dos=2, + pref_cdf=0.0, + pref_acdf=0.0, ): # ... input_dict = { "key_to_pref_map": key_to_pref_map, "starter_learning_rate": starter_learning_rate, - "numb_dos": 2, + "numb_dos": numb_dos, "start_pref_dos": pref_dos, "limit_pref_dos": pref_dos / 2, "start_pref_ados": pref_ados, "limit_pref_ados": pref_ados / 2, - "start_pref_cdf": 0.0, - "limit_pref_cdf": 0.0, - "start_pref_acdf": 0.0, - "limit_pref_acdf": 0.0, + "start_pref_cdf": pref_cdf, + "limit_pref_cdf": pref_cdf / 2, + "start_pref_acdf": pref_acdf, + "limit_pref_acdf": pref_acdf / 2, }
149-167
: Parameterize tensor configurationThe function contains hardcoded values that should be configurable for different test scenarios:
- tensor_name is set to "test_tensor"
- tensor_size is set to 2
Consider making these values configurable:
def LossParamTensor( pref=1.0, pref_atomic=1.0, + tensor_name="test_tensor", + tensor_size=2, ): - tensor_name = "test_tensor" key_to_pref_map = { tensor_name: pref, f"atomic_{tensor_name}": pref_atomic, } input_dict = { "key_to_pref_map": key_to_pref_map, "tensor_name": tensor_name, - "tensor_size": 2, + "tensor_size": tensor_size, "label_name": tensor_name, "pref": pref, "pref_atomic": pref_atomic, }
1-203
: Consider comprehensive refactoring for better maintainability and test coverageThe file would benefit from several architectural improvements:
- Extract common parameter generation logic into a base utility
- Create a configuration class to manage test parameters
- Add docstrings and type hints for better code documentation
- Consider property-based testing for more comprehensive coverage
Would you like me to provide a detailed example of these improvements?
deepmd/pt/loss/dos.py (2)
154-157
: LGTM! Consider adding explicit dimension size checks.The change to
torch.cumsum
is correct and follows PyTorch's best practices. The reshaping and cumulative sum operations are properly implemented for local tensor calculations.Consider adding explicit dimension checks before the reshape operation to catch potential shape mismatches early:
+ expected_size = [-1, natoms, self.numb_dos] + if model_pred["atom_dos"].shape[-1] != self.numb_dos: + raise ValueError(f"Expected atom_dos to have size {expected_size}, got {model_pred['atom_dos'].shape}") local_tensor_pred_cdf = torch.cumsum( model_pred["atom_dos"].reshape([-1, natoms, self.numb_dos]), dim=-1 )
202-205
: LGTM! Consider adding explicit dimension size checks.The change to
torch.cumsum
is correct and follows PyTorch's best practices. The reshaping and cumulative sum operations are properly implemented for global tensor calculations.Consider adding explicit dimension checks before the reshape operation to catch potential shape mismatches early:
+ expected_size = [-1, self.numb_dos] + if model_pred["dos"].shape[-1] != self.numb_dos: + raise ValueError(f"Expected dos to have size {expected_size}, got {model_pred['dos'].shape}") global_tensor_pred_cdf = torch.cumsum( model_pred["dos"].reshape([-1, self.numb_dos]), dim=-1 )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (10)
deepmd/dpmodel/common.py
(2 hunks)deepmd/pt/loss/dos.py
(2 hunks)source/tests/universal/common/cases/loss/__init__.py
(1 hunks)source/tests/universal/common/cases/loss/loss.py
(1 hunks)source/tests/universal/common/cases/loss/utils.py
(1 hunks)source/tests/universal/dpmodel/loss/__init__.py
(1 hunks)source/tests/universal/dpmodel/loss/test_loss.py
(1 hunks)source/tests/universal/pt/backend.py
(3 hunks)source/tests/universal/pt/loss/__init__.py
(1 hunks)source/tests/universal/pt/loss/test_loss.py
(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- source/tests/universal/common/cases/loss/init.py
- source/tests/universal/dpmodel/loss/init.py
- source/tests/universal/pt/loss/init.py
🧰 Additional context used
🪛 Ruff
source/tests/universal/common/cases/loss/utils.py
21-21: Local variable module
is assigned to but never used
Remove assignment to unused variable module
(F841)
🪛 GitHub Check: CodeQL
source/tests/universal/common/cases/loss/utils.py
[notice] 21-21: Unused local variable
Variable module is not used.
🔇 Additional comments (8)
source/tests/universal/common/cases/loss/loss.py (2)
1-6
: LGTM! Clean structure with proper licensing and imports.
The code follows best practices with clear license header and well-structured imports.
9-11
: Consider removing redundant setUp method and verify test coverage.
-
The current
setUp
implementation is redundant as it only calls the parent's method without additional setup. Python will automatically call the parent'ssetUp
if not overridden. -
The class appears to have no test methods. If this is intended to be a concrete test class rather than a base class, test methods should be added.
Let's verify if this is intended to be a base class or if test methods are defined elsewhere:
source/tests/universal/pt/loss/test_loss.py (2)
1-27
: LGTM! Well-organized imports with clear separation of concerns.
The imports are logically grouped and follow good practices:
- Standard library imports
- Main loss classes
- Test utilities and base classes
- Test parameters
39-44
: LGTM! Well-structured test class with proper initialization.
The test class follows good practices:
- Appropriate use of multiple inheritance
- Clean setup method
- Clear parameter extraction
deepmd/dpmodel/common.py (2)
Line range hint 32-39
: LGTM on dictionary updates
The synchronization between PRECISION_DICT
and RESERVED_PRECISON_DICT
is maintained correctly, and the assertion check validates this relationship. The change to use np.bool_
is more consistent with NumPy's type system.
Also applies to: 48-51
32-32
: Verify the impact on existing model files
The change from bool
to np.bool_
in both dictionaries could affect the loading of existing model files that were saved with the old boolean type.
Let's check for any existing model files that might be affected:
Also applies to: 48-48
source/tests/universal/pt/backend.py (1)
Line range hint 1-148
: Overall changes look good!
The refactoring of tensor conversion logic improves code organization and maintainability while maintaining backward compatibility. The test infrastructure properly handles different module variants and edge cases.
deepmd/pt/loss/dos.py (1)
Line range hint 154-205
: Verify complete replacement of deprecated operations.
Let's ensure all instances of deprecated operations have been replaced throughout the codebase.
✅ Verification successful
Deprecated operations fully replaced.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining instances of deprecated torch.cusum
# and other commonly deprecated PyTorch operations in the codebase
echo "Checking for deprecated torch.cusum usage..."
rg "torch\.cusum|torch\.[^(]*cusum"
echo "Checking for other common deprecated PyTorch operations..."
rg "torch\.(np|th|is_tensor|set_flush_denormal|get_flush_denormal|sigmoid_|tanh_|absolute|absolute_|angle|angle_|arccos|arccos_|arccosh|arccosh_|arcsin|arcsin_|arcsinh|arcsinh_|arctan|arctan_|arctanh|arctanh_|fix|fix_)"
Length of output: 709
Co-authored-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu> Signed-off-by: Duo <50307526+iProzd@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (4)
source/tests/universal/pt/loss/test_loss.py (2)
30-39
: Add class docstring to document test coverageThe test class effectively combines multiple loss types using parameterization, but adding a docstring would help document:
- The purpose of combining these specific loss types
- What aspects of each loss type are being tested
- The relationship between the parameter lists and loss classes
Example docstring:
"""Test suite for PyTorch loss functions. Covers the following loss implementations: - EnergyStdLoss: Standard energy loss - EnergySpinLoss: Energy loss with spin components - DOSLoss: Density of states loss - TensorLoss: Generic tensor-based loss - PropertyLoss: Property prediction loss Each loss type is tested with parameters from its corresponding parameter list. """
40-46
: Enhance type safety in setUp methodConsider adding type hints and assertions to make the setup more robust:
- def setUp(self): - (LossParam, Loss) = self.param[0] + def setUp(self) -> None: + (LossParam, Loss) = self.param[0] + assert callable(LossParam), "LossParam must be callable" + assert isinstance(Loss, type), "Loss must be a class" LossTest.setUp(self) self.module_class = Loss self.input_dict = LossParam() + assert isinstance(self.input_dict, dict), "LossParam must return a dict" self.key_to_pref_map = self.input_dict.pop("key_to_pref_map") self.module = Loss(**self.input_dict)source/tests/universal/common/cases/loss/utils.py (2)
58-64
: Improve code clarity with descriptive variablesThe module call could be more readable:
+ # Empty dict for training prefix + training_prefix = {} + learning_rate = 1.0 _, loss, more_loss = module( - {}, + training_prefix, fake_model, labels, natoms, - 1.0, + learning_rate, )
67-79
: Add docstring to document the fake_input utility functionThe function would benefit from documentation explaining its parameters and return value.
def fake_input(data_item: DataRequirementItem, natoms=5, nframes=2) -> np.ndarray: + """Generate fake input data for testing based on DataRequirementItem properties. + + Args: + data_item: Specification of the required data format and properties + natoms: Number of atoms in the system (default: 5) + nframes: Number of frames to generate (default: 2) + + Returns: + np.ndarray: Random data matching the specified requirements + """ ndof = data_item.ndof atomic = data_item.atomic repeat = data_item.repeat
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
source/tests/universal/common/cases/loss/utils.py
(1 hunks)source/tests/universal/pt/loss/test_loss.py
(1 hunks)
🧰 Additional context used
🪛 Ruff
source/tests/universal/common/cases/loss/utils.py
21-21: Local variable module
is assigned to but never used
Remove assignment to unused variable module
(F841)
🔇 Additional comments (3)
source/tests/universal/pt/loss/test_loss.py (1)
1-28
: LGTM! Well-organized imports
The imports are properly structured with clear grouping and explicit relative paths.
source/tests/universal/common/cases/loss/utils.py (2)
1-11
: LGTM! Clean import structure
The imports are well-organized and all are utilized in the implementation.
14-19
: Required attributes should be defined in setUp method
The empty setUp
method should enforce implementation of required attributes in derived classes.
Summary by CodeRabbit
Release Notes
New Features
LossTest
class for enhanced testing of loss functions.test_loss.py
file.Bug Fixes
DOSLoss
class to ensure accurate cumulative sum calculations.Documentation
Chores
PTTestCase
class for improved handling of tensors and arrays.