Skip to content

Conversation

@SchrodingersCattt
Copy link

@SchrodingersCattt SchrodingersCattt commented Oct 28, 2025

  • Refactor sort_atom_names to correctly handle type_map with zero-count elements while preserving existing alphabetical sorting behavior.
  • Only validate atom types that actually appear (count > 0) against the provided type_map, allowing new elements in type_map to be added with zero count.
  • Improve robustness and clarity of atom type remapping logic, restoring original commenting style for maintainability.

Summary by CodeRabbit

  • Bug Fixes
    • Improved validation and clearer errors when active atom types are missing; ensured atom names, counts, and type indices are consistently reordered to match a provided mapping or an alphabetical fallback.
  • Tests
    • Added unit tests covering mapping-based sorting, zero-count handling, missing-active-type errors, and alphabetical sorting to verify names, counts, and type indices.
  • Documentation
    • Clarified docstrings/comments describing mapping behavior and invariants.

@coderabbitai
Copy link

coderabbitai bot commented Oct 28, 2025

📝 Walkthrough

Walkthrough

The sort_atom_names function in dpdata/utils.py now enforces that all active atom names exist in a provided type_map, builds a new ordering strictly from type_map (preserving active counts, assigning 0 to new types), remaps atom_types accordingly, and adjusts alphabetical sorting to use the inverse permutation for atom_types. New unit tests were added.

Changes

Cohort / File(s) Summary
sort_atom_names implementation
dpdata/utils.py
Validate that all active atom names (positive atom_numbs) are present in a provided type_map and raise ValueError if missing; construct new ordering from type_map (preserve active counts, set 0 for new types); compute mapping from old type indices to new and remap atom_types in-place; use inverse permutation for atom_types in alphabetical path; minor docstring/comment updates.
unit tests for type_map behavior
tests/test_type_map_utils.py
New test module exercising: sorting with type_map; type_map entries with zero-count elements; error when active types are missing from type_map (asserts message contains missing labels); alphabetical sorting without type_map; removal/reindexing when type_map excludes zero-count elements. Tests verify atom_names, atom_numbs, and atom_types content and shapes.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant sort_atom_names
    participant Validator
    participant Reorderer
    participant Remapper

    Caller->>sort_atom_names: call(atom_names, atom_numbs, atom_types, type_map?)
    alt type_map provided
        sort_atom_names->>Validator: ensure active names ⊂ type_map
        Validator-->>sort_atom_names: ok / raise ValueError
        alt ok
            sort_atom_names->>Reorderer: build ordering from type_map (preserve counts, new->0)
            Reorderer-->>sort_atom_names: new_order, old_to_new_map
            sort_atom_names->>Remapper: remap atom_types using old_to_new_map
            Remapper-->>sort_atom_names: atom_types updated (in-place)
        else raise
            Validator-->>Caller: ValueError (missing active types)
        end
    else no type_map
        sort_atom_names->>Reorderer: compute alphabetical permutation
        Reorderer-->>sort_atom_names: perm, inverse_perm
        sort_atom_names->>Remapper: remap atom_types using inverse_perm
        Remapper-->>sort_atom_names: atom_types updated
    end
    sort_atom_names-->>Caller: atom_names, atom_numbs, atom_types updated in-place
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect type_map validation logic and exact ValueError message content used by tests.
  • Verify correctness of old->new index mapping and that atom_types remapping preserves dtype, shape, and semantics.
  • Check handling of zero-count entries (preserve vs remove per type_map) and edge cases in tests.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat: support zero-count elements in type_map for sort_atom_names" directly describes the primary objective of the changeset. The modifications to dpdata/utils.py refactor the sort_atom_names function specifically to handle type_map entries with zero-count elements, and the new test module comprehensively validates this behavior. The title is clear, specific, and uses proper conventional commit conventions, allowing developers scanning the history to immediately understand the feature being introduced.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cfde1f4 and 3ec4047.

📒 Files selected for processing (1)
  • tests/test_type_map_utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_type_map_utils.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
dpdata/utils.py (1)

65-69: Clarify docstring about zero-count handling.

Spell out that type_map becomes the authoritative ordering, new names get 0 counts, and names not in type_map are dropped.

@@
-    alphabetical order. If type_map is given, atom_names will be set to type_map,
-    and zero-count elements are kept.
+    alphabetical order. If type_map is given, atom_names is set exactly to
+    type_map; names present in type_map but absent in data are added with
+    atom_numbs = 0, and names absent from type_map are dropped.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7af8d74 and 2c5b95f.

📒 Files selected for processing (1)
  • dpdata/utils.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
dpdata/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

dpdata/**/*.py: Python code in the main package must pass ruff linting (ruff check dpdata/)
Format Python code using ruff formatter (ruff format dpdata/) before committing

Files:

  • dpdata/utils.py
**/*

📄 CodeRabbit inference engine (AGENTS.md)

**/*: Remove trailing whitespace in files (pre-commit trailing-whitespace hook)
Ensure files end with a newline at EOF (end-of-file-fixer)

Files:

  • dpdata/utils.py
🧬 Code graph analysis (1)
dpdata/utils.py (1)
dpdata/system.py (2)
  • append (467-521)
  • append (1499-1514)
🪛 Ruff (0.14.1)
dpdata/utils.py

86-86: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
dpdata/utils.py (2)

121-132: Alphabetical path remap is correct and stable.

The inverse-permutation update of atom_types is precise; stable argsort avoids perturbations.


78-118: I need to verify several aspects of this review before finalizing it. Let me check the ruff TRY003 rule and the validity of the proposed changes.

Now let me check if there's any code that depends on the exception message format and also verify the vectorization logic.

Now let me examine the codebase to see if there's any code handling or expecting the ValueError message format:Good, I've confirmed that NumPy fancy indexing with an integer array (like mapping[old_types]) is a valid and efficient vectorization pattern. Now let me run a more detailed analysis of the code to identify critical issues:Perfect. The exception instance has an args attribute that stores the arguments. For convenience, builtin exception types define str() to print all the arguments without explicitly accessing .args. So when ValueError("msg", details) is raised, the Exception constructor stores any additional values in a tuple called args, and when str() is called, it will print both.

Now I have enough information to generate my analysis. The review comment has three major concerns that need verification:

  1. TRY003 compliance: The proposed change of passing details as a second argument is a valid pattern in Python for exception handling
  2. Vectorization logic: The proposed numpy fancy indexing approach is valid
  3. Robustness checks: Adding duplicate detection and assertion for unmapped types

Let me generate the final rewritten review comment:


Address critical issues: implement TRY003 fix, vectorize remapping, add defensive checks for type_map integrity.

The review identifies valid issues requiring fixes:

  1. Lint compliance (TRY003): The formatted f-string in raise statement violates the ruff rule. Passing the error details as a second argument is a valid remedy—Python will store both in exception.args and display both via str().

  2. Duplicate type_map entries: Add validation to reject duplicate names in type_map before processing.

  3. Vectorized remapping: Replace the loop-based index mapping with a mapping array using numpy fancy indexing (mapping[old_types]), which is more efficient and cleaner.

  4. Defensive assertion: Add a check to ensure all atom_types reference valid indices in type_map via the -1 sentinel values.

All proposed changes are correct and should be applied as shown in the diff. The vectorized approach is compatible with the existing codebase and improves performance without changing behavior.

@wanghan-iapcm wanghan-iapcm requested a review from njzjz October 30, 2025 10:38
@codspeed-hq
Copy link

codspeed-hq bot commented Oct 30, 2025

CodSpeed Performance Report

Merging #912 will not alter performance

Comparing SchrodingersCattt:enhance/type-map-handling (3ec4047) with devel (7af8d74)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 2 untouched
⏩ 2 skipped1

Footnotes

  1. 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link

codecov bot commented Oct 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.10%. Comparing base (7af8d74) to head (3ec4047).

Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #912      +/-   ##
==========================================
+ Coverage   86.06%   86.10%   +0.03%     
==========================================
  Files          83       83              
  Lines        7886     7908      +22     
==========================================
+ Hits         6787     6809      +22     
  Misses       1099     1099              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/test_type_map_utils.py (1)

68-82: LGTM! Validates zero-count element removal.

This test correctly verifies that zero-count elements can be removed from the data when they're excluded from the type_map.

Minor note: The comment on line 76 mentions "A is omitted" which refers to the abstract pattern rather than the actual element "Cl". While technically correct, you might consider clarifying: # Cl is omitted because it has 0 atoms.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c5b95f and cfde1f4.

📒 Files selected for processing (1)
  • tests/test_type_map_utils.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*

📄 CodeRabbit inference engine (AGENTS.md)

**/*: Remove trailing whitespace in files (pre-commit trailing-whitespace hook)
Ensure files end with a newline at EOF (end-of-file-fixer)

Files:

  • tests/test_type_map_utils.py
tests/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

Name test modules as test_*.py for unittest discovery

Files:

  • tests/test_type_map_utils.py
🔇 Additional comments (6)
tests/test_type_map_utils.py (6)

1-10: LGTM! Well-structured test module setup.

The imports are clean and the test class follows unittest conventions properly.


11-24: LGTM! Core type_map functionality well tested.

This test correctly verifies that sort_atom_names reorders atoms according to the provided type_map and properly remaps atom_types.


25-38: LGTM! Critical test for the PR's main feature.

This test correctly validates that zero-count elements in the type_map are properly handled, adding them to atom_names with a count of 0 while preserving existing atom_types.


39-54: LGTM! Robust validation test.

This test properly verifies that the function raises a clear, informative ValueError when active atom types are missing from the provided type_map.


55-67: LGTM! Ensures backward compatibility.

This test correctly verifies that the alphabetical sorting behavior is preserved when no type_map is provided, ensuring existing functionality remains intact.


84-85: LGTM! Standard unittest main block.

The main block follows the standard unittest pattern for running tests directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant