-
Notifications
You must be signed in to change notification settings - Fork 152
feat: support zero-count elements in type_map for sort_atom_names #912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
feat: support zero-count elements in type_map for sort_atom_names #912
Conversation
📝 WalkthroughWalkthroughThe Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant sort_atom_names
participant Validator
participant Reorderer
participant Remapper
Caller->>sort_atom_names: call(atom_names, atom_numbs, atom_types, type_map?)
alt type_map provided
sort_atom_names->>Validator: ensure active names ⊂ type_map
Validator-->>sort_atom_names: ok / raise ValueError
alt ok
sort_atom_names->>Reorderer: build ordering from type_map (preserve counts, new->0)
Reorderer-->>sort_atom_names: new_order, old_to_new_map
sort_atom_names->>Remapper: remap atom_types using old_to_new_map
Remapper-->>sort_atom_names: atom_types updated (in-place)
else raise
Validator-->>Caller: ValueError (missing active types)
end
else no type_map
sort_atom_names->>Reorderer: compute alphabetical permutation
Reorderer-->>sort_atom_names: perm, inverse_perm
sort_atom_names->>Remapper: remap atom_types using inverse_perm
Remapper-->>sort_atom_names: atom_types updated
end
sort_atom_names-->>Caller: atom_names, atom_numbs, atom_types updated in-place
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
dpdata/utils.py (1)
65-69: Clarify docstring about zero-count handling.Spell out that type_map becomes the authoritative ordering, new names get 0 counts, and names not in type_map are dropped.
@@ - alphabetical order. If type_map is given, atom_names will be set to type_map, - and zero-count elements are kept. + alphabetical order. If type_map is given, atom_names is set exactly to + type_map; names present in type_map but absent in data are added with + atom_numbs = 0, and names absent from type_map are dropped.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dpdata/utils.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
dpdata/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
dpdata/**/*.py: Python code in the main package must pass ruff linting (ruff check dpdata/)
Format Python code using ruff formatter (ruff format dpdata/) before committing
Files:
dpdata/utils.py
**/*
📄 CodeRabbit inference engine (AGENTS.md)
**/*: Remove trailing whitespace in files (pre-commit trailing-whitespace hook)
Ensure files end with a newline at EOF (end-of-file-fixer)
Files:
dpdata/utils.py
🧬 Code graph analysis (1)
dpdata/utils.py (1)
dpdata/system.py (2)
append(467-521)append(1499-1514)
🪛 Ruff (0.14.1)
dpdata/utils.py
86-86: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
dpdata/utils.py (2)
121-132: Alphabetical path remap is correct and stable.The inverse-permutation update of atom_types is precise; stable argsort avoids perturbations.
78-118: I need to verify several aspects of this review before finalizing it. Let me check the ruff TRY003 rule and the validity of the proposed changes.Now let me check if there's any code that depends on the exception message format and also verify the vectorization logic.
Now let me examine the codebase to see if there's any code handling or expecting the ValueError message format:Good, I've confirmed that NumPy fancy indexing with an integer array (like
mapping[old_types]) is a valid and efficient vectorization pattern. Now let me run a more detailed analysis of the code to identify critical issues:Perfect. The exception instance has an args attribute that stores the arguments. For convenience, builtin exception types define str() to print all the arguments without explicitly accessing .args. So whenValueError("msg", details)is raised, the Exception constructor stores any additional values in a tuple called args, and when str() is called, it will print both.Now I have enough information to generate my analysis. The review comment has three major concerns that need verification:
- TRY003 compliance: The proposed change of passing details as a second argument is a valid pattern in Python for exception handling
- Vectorization logic: The proposed numpy fancy indexing approach is valid
- Robustness checks: Adding duplicate detection and assertion for unmapped types
Let me generate the final rewritten review comment:
Address critical issues: implement TRY003 fix, vectorize remapping, add defensive checks for type_map integrity.
The review identifies valid issues requiring fixes:
Lint compliance (TRY003): The formatted f-string in raise statement violates the ruff rule. Passing the error details as a second argument is a valid remedy—Python will store both in exception.args and display both via str().
Duplicate type_map entries: Add validation to reject duplicate names in type_map before processing.
Vectorized remapping: Replace the loop-based index mapping with a mapping array using numpy fancy indexing (
mapping[old_types]), which is more efficient and cleaner.Defensive assertion: Add a check to ensure all atom_types reference valid indices in type_map via the -1 sentinel values.
All proposed changes are correct and should be applied as shown in the diff. The vectorized approach is compatible with the existing codebase and improves performance without changing behavior.
CodSpeed Performance ReportMerging #912 will not alter performanceComparing
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## devel #912 +/- ##
==========================================
+ Coverage 86.06% 86.10% +0.03%
==========================================
Files 83 83
Lines 7886 7908 +22
==========================================
+ Hits 6787 6809 +22
Misses 1099 1099 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/test_type_map_utils.py (1)
68-82: LGTM! Validates zero-count element removal.This test correctly verifies that zero-count elements can be removed from the data when they're excluded from the type_map.
Minor note: The comment on line 76 mentions "A is omitted" which refers to the abstract pattern rather than the actual element "Cl". While technically correct, you might consider clarifying:
# Cl is omitted because it has 0 atoms.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_type_map_utils.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*
📄 CodeRabbit inference engine (AGENTS.md)
**/*: Remove trailing whitespace in files (pre-commit trailing-whitespace hook)
Ensure files end with a newline at EOF (end-of-file-fixer)
Files:
tests/test_type_map_utils.py
tests/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
Name test modules as test_*.py for unittest discovery
Files:
tests/test_type_map_utils.py
🔇 Additional comments (6)
tests/test_type_map_utils.py (6)
1-10: LGTM! Well-structured test module setup.The imports are clean and the test class follows unittest conventions properly.
11-24: LGTM! Core type_map functionality well tested.This test correctly verifies that
sort_atom_namesreorders atoms according to the provided type_map and properly remaps atom_types.
25-38: LGTM! Critical test for the PR's main feature.This test correctly validates that zero-count elements in the type_map are properly handled, adding them to atom_names with a count of 0 while preserving existing atom_types.
39-54: LGTM! Robust validation test.This test properly verifies that the function raises a clear, informative ValueError when active atom types are missing from the provided type_map.
55-67: LGTM! Ensures backward compatibility.This test correctly verifies that the alphabetical sorting behavior is preserved when no type_map is provided, ensuring existing functionality remains intact.
84-85: LGTM! Standard unittest main block.The main block follows the standard unittest pattern for running tests directly.
sort_atom_namesto correctly handletype_mapwith zero-count elements while preserving existing alphabetical sorting behavior.type_map, allowing new elements intype_mapto be added with zero count.Summary by CodeRabbit