feat: support zero-count elements in type_map for sort_atom_names #912

SchrodingersCattt · 2025-10-28T09:41:13Z

Refactor sort_atom_names to correctly handle type_map with zero-count elements while preserving existing alphabetical sorting behavior.
Only validate atom types that actually appear (count > 0) against the provided type_map, allowing new elements in type_map to be added with zero count.
Improve robustness and clarity of atom type remapping logic, restoring original commenting style for maintainability.

Summary by CodeRabbit

Bug Fixes
- Improved validation and clearer errors when active atom types are missing; ensured atom names, counts, and type indices are consistently reordered to match a provided mapping or an alphabetical fallback.
Tests
- Added unit tests covering mapping-based sorting, zero-count handling, missing-active-type errors, and alphabetical sorting to verify names, counts, and type indices.
Documentation
- Clarified docstrings/comments describing mapping behavior and invariants.

coderabbitai · 2025-10-28T09:45:52Z

📝 Walkthrough

Walkthrough

The sort_atom_names function in dpdata/utils.py now enforces that all active atom names exist in a provided type_map, builds a new ordering strictly from type_map (preserving active counts, assigning 0 to new types), remaps atom_types accordingly, and adjusts alphabetical sorting to use the inverse permutation for atom_types. New unit tests were added.

Changes

Cohort / File(s)	Summary
sort_atom_names implementation `dpdata/utils.py`	Validate that all active atom names (positive `atom_numbs`) are present in a provided `type_map` and raise `ValueError` if missing; construct new ordering from `type_map` (preserve active counts, set 0 for new types); compute mapping from old type indices to new and remap `atom_types` in-place; use inverse permutation for `atom_types` in alphabetical path; minor docstring/comment updates.
unit tests for type_map behavior `tests/test_type_map_utils.py`	New test module exercising: sorting with `type_map`; `type_map` entries with zero-count elements; error when active types are missing from `type_map` (asserts message contains missing labels); alphabetical sorting without `type_map`; removal/reindexing when `type_map` excludes zero-count elements. Tests verify `atom_names`, `atom_numbs`, and `atom_types` content and shapes.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant sort_atom_names
    participant Validator
    participant Reorderer
    participant Remapper

    Caller->>sort_atom_names: call(atom_names, atom_numbs, atom_types, type_map?)
    alt type_map provided
        sort_atom_names->>Validator: ensure active names ⊂ type_map
        Validator-->>sort_atom_names: ok / raise ValueError
        alt ok
            sort_atom_names->>Reorderer: build ordering from type_map (preserve counts, new->0)
            Reorderer-->>sort_atom_names: new_order, old_to_new_map
            sort_atom_names->>Remapper: remap atom_types using old_to_new_map
            Remapper-->>sort_atom_names: atom_types updated (in-place)
        else raise
            Validator-->>Caller: ValueError (missing active types)
        end
    else no type_map
        sort_atom_names->>Reorderer: compute alphabetical permutation
        Reorderer-->>sort_atom_names: perm, inverse_perm
        sort_atom_names->>Remapper: remap atom_types using inverse_perm
        Remapper-->>sort_atom_names: atom_types updated
    end
    sort_atom_names-->>Caller: atom_names, atom_numbs, atom_types updated in-place

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Inspect type_map validation logic and exact ValueError message content used by tests.
Verify correctness of old->new index mapping and that atom_types remapping preserves dtype, shape, and semantics.
Check handling of zero-count entries (preserve vs remove per type_map) and edge cases in tests.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "feat: support zero-count elements in type_map for sort_atom_names" directly describes the primary objective of the changeset. The modifications to `dpdata/utils.py` refactor the `sort_atom_names` function specifically to handle type_map entries with zero-count elements, and the new test module comprehensively validates this behavior. The title is clear, specific, and uses proper conventional commit conventions, allowing developers scanning the history to immediately understand the feature being introduced.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cfde1f4 and 3ec4047.

📒 Files selected for processing (1)

tests/test_type_map_utils.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_type_map_utils.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

dpdata/utils.py (1)
65-69: Clarify docstring about zero-count handling.

Spell out that type_map becomes the authoritative ordering, new names get 0 counts, and names not in type_map are dropped.
@@
-    alphabetical order. If type_map is given, atom_names will be set to type_map,
-    and zero-count elements are kept.
+    alphabetical order. If type_map is given, atom_names is set exactly to
+    type_map; names present in type_map but absent in data are added with
+    atom_numbs = 0, and names absent from type_map are dropped.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7af8d74 and 2c5b95f.

📒 Files selected for processing (1)

dpdata/utils.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

dpdata/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

dpdata/**/*.py: Python code in the main package must pass ruff linting (ruff check dpdata/)
Format Python code using ruff formatter (ruff format dpdata/) before committing

Files:

dpdata/utils.py

**/*

📄 CodeRabbit inference engine (AGENTS.md)

**/*: Remove trailing whitespace in files (pre-commit trailing-whitespace hook)
Ensure files end with a newline at EOF (end-of-file-fixer)

Files:

dpdata/utils.py

🧬 Code graph analysis (1)

dpdata/utils.py (1)

dpdata/system.py (2)

append (467-521)

append (1499-1514)

🪛 Ruff (0.14.1)

dpdata/utils.py

86-86: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)

dpdata/utils.py (2)

121-132: Alphabetical path remap is correct and stable.

The inverse-permutation update of atom_types is precise; stable argsort avoids perturbations.

78-118: I need to verify several aspects of this review before finalizing it. Let me check the ruff TRY003 rule and the validity of the proposed changes.

Now let me check if there's any code that depends on the exception message format and also verify the vectorization logic.

Now let me examine the codebase to see if there's any code handling or expecting the ValueError message format:Good, I've confirmed that NumPy fancy indexing with an integer array (like mapping[old_types]) is a valid and efficient vectorization pattern. Now let me run a more detailed analysis of the code to identify critical issues:Perfect. The exception instance has an args attribute that stores the arguments. For convenience, builtin exception types define str() to print all the arguments without explicitly accessing .args. So when ValueError("msg", details) is raised, the Exception constructor stores any additional values in a tuple called args, and when str() is called, it will print both.

Now I have enough information to generate my analysis. The review comment has three major concerns that need verification:

TRY003 compliance: The proposed change of passing details as a second argument is a valid pattern in Python for exception handling

Vectorization logic: The proposed numpy fancy indexing approach is valid

Robustness checks: Adding duplicate detection and assertion for unmapped types

Let me generate the final rewritten review comment:

Address critical issues: implement TRY003 fix, vectorize remapping, add defensive checks for type_map integrity.

The review identifies valid issues requiring fixes:

Lint compliance (TRY003): The formatted f-string in raise statement violates the ruff rule. Passing the error details as a second argument is a valid remedy—Python will store both in exception.args and display both via str().

Duplicate type_map entries: Add validation to reject duplicate names in type_map before processing.

Vectorized remapping: Replace the loop-based index mapping with a mapping array using numpy fancy indexing (mapping[old_types]), which is more efficient and cleaner.

Defensive assertion: Add a check to ensure all atom_types reference valid indices in type_map via the -1 sentinel values.

All proposed changes are correct and should be applied as shown in the diff. The vectorized approach is compatible with the existing codebase and improves performance without changing behavior.

codspeed-hq · 2025-10-30T11:24:43Z

CodSpeed Performance Report

Merging #912 will not alter performance

_{Comparing SchrodingersCattt:enhance/type-map-handling (3ec4047) with devel (7af8d74)}

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 2 untouched
⏩ 2 skipped¹

2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

codecov · 2025-10-30T11:25:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.10%. Comparing base (7af8d74) to head (3ec4047).

Additional details and impacted files

@@            Coverage Diff             @@
##            devel     #912      +/-   ##
==========================================
+ Coverage   86.06%   86.10%   +0.03%     
==========================================
  Files          83       83              
  Lines        7886     7908      +22     
==========================================
+ Hits         6787     6809      +22     
  Misses       1099     1099

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/test_type_map_utils.py (1)

68-82: LGTM! Validates zero-count element removal.

This test correctly verifies that zero-count elements can be removed from the data when they're excluded from the type_map.

Minor note: The comment on line 76 mentions "A is omitted" which refers to the abstract pattern rather than the actual element "Cl". While technically correct, you might consider clarifying: # Cl is omitted because it has 0 atoms.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c5b95f and cfde1f4.

📒 Files selected for processing (1)

tests/test_type_map_utils.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*

📄 CodeRabbit inference engine (AGENTS.md)

**/*: Remove trailing whitespace in files (pre-commit trailing-whitespace hook)
Ensure files end with a newline at EOF (end-of-file-fixer)

Files:

tests/test_type_map_utils.py

tests/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

Name test modules as test_*.py for unittest discovery

Files:

tests/test_type_map_utils.py

🔇 Additional comments (6)

tests/test_type_map_utils.py (6)

1-10: LGTM! Well-structured test module setup.

The imports are clean and the test class follows unittest conventions properly.

11-24: LGTM! Core type_map functionality well tested.

This test correctly verifies that sort_atom_names reorders atoms according to the provided type_map and properly remaps atom_types.

25-38: LGTM! Critical test for the PR's main feature.

This test correctly validates that zero-count elements in the type_map are properly handled, adding them to atom_names with a count of 0 while preserving existing atom_types.

39-54: LGTM! Robust validation test.

This test properly verifies that the function raises a clear, informative ValueError when active atom types are missing from the provided type_map.

55-67: LGTM! Ensures backward compatibility.

This test correctly verifies that the alphabetical sorting behavior is preserved when no type_map is provided, ensuring existing functionality remains intact.

84-85: LGTM! Standard unittest main block.

The main block follows the standard unittest pattern for running tests directly.

feat: support zero-count elements in type_map for sort_atom_names

2c5b95f

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

wanghan-iapcm requested a review from njzjz October 30, 2025 10:38

SchrodingersCattt and others added 2 commits October 31, 2025 00:18

test: add unit tests for atom type remapping

f162c55

[pre-commit.ci] auto fixes from pre-commit.com hooks

cfde1f4

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Oct 31, 2025

View reviewed changes

style: enhance comments

3ec4047

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support zero-count elements in type_map for sort_atom_names #912

feat: support zero-count elements in type_map for sort_atom_names #912

SchrodingersCattt commented Oct 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 28, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

codspeed-hq bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: support zero-count elements in type_map for sort_atom_names #912

Are you sure you want to change the base?

feat: support zero-count elements in type_map for sort_atom_names #912

Conversation

SchrodingersCattt commented Oct 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codspeed-hq bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #912 will not alter performance

Summary

Footnotes

Uh oh!

codecov bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SchrodingersCattt commented Oct 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 28, 2025 •

edited

Loading

codspeed-hq bot commented Oct 30, 2025 •

edited

Loading

codecov bot commented Oct 30, 2025 •

edited

Loading