Skip to content

Conversation

@meiravgri
Copy link
Collaborator

@meiravgri meiravgri commented Sep 18, 2025

Purpose

This PR backports the resize logic improvements from PR #753 to the 0.6 branch, adapting the implementation to account for 0.6's specific capacity management approach.
The original change decouples shrinking and growing operations in vector index algorithms to prevent oscillating allocation/deallocation cycles during index updates at block boundaries, which was particularly problematic for large containers like hash tables and metadata vectors.

Behavior Changes in 0.6 Branch

Before this PR (0.6 branch):

  • Shrinking occurred whenever count % blockSize == 0 during vector removal
  • Metadata containers were immediately resized down by exactly one block size
  • No buffer zone existed between growing and shrinking operations

After this PR (0.6 branch):

  • Growing: Triggered when we need space for the next element (id >= capacity in BruteForce, cur_element_count >= max_elements_ in HNSW)
  • Shrinking: Only when there are 2+ free blocks (indexCapacity() >= (indexSize() + 2 * blockSize))
  • Buffer Zone: Maintains at least 1 block buffer to prevent oscillation
  • Special handling: Always shrinks by exactly one block, with special condition for the last block when count == 0

Key Differences from 0.8 Branch Implementation

The 0.6 branch implementation differs from the 0.8 backport (PR #777) due to different initial capacity handling and HNSW architecture:

  1. Initial Capacity Rounding:

    • 0.8 branch: Initial capacity is rounded up to block size at index creation, so resize logic is simpler
    • 0.6 branch: Initial capacity is NOT rounded up to block size at initialization - rounding only occurs at the first resize operation
  2. HNSW Architecture Differences:

    • 0.6 branch: Reallocates ALL containers including vector data during resize operations
    • 0.7+ branches: Uses separate vector and graph data blocks, allowing incremental block-by-block operations

Implementation Details

The key changes ensure that:

  • Growing: Only occurs when we need space for the next element to be added
  • Shrinking: Only occurs when there's sufficient buffer (2+ blocks free) or when the index is completely empty
  • Block Alignment: Handled during resize operations rather than at initialization, maintaining 0.6's deferred rounding approach

- Refactor `resize_and_align_index` tests in both `test_bruteforce.cpp` and `test_bruteforce_multi.cpp` to improve clarity and maintainability.
- Introduce helper functions to verify index size and capacity, reducing code duplication.
- Add comprehensive checks for index size, capacity, and label counts during vector addition and deletion.
- Implement tests to ensure no oscillation in index size and capacity during repeated add/delete cycles.
- Address edge cases for initial capacity and resizing behavior, ensuring proper alignment with block sizes.
@meiravgri meiravgri changed the title Enhance BruteForce index tests for resizing and alignment [0.6] [MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW Sep 18, 2025
@codecov
Copy link

codecov bot commented Sep 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.07%. Comparing base (71bd103) to head (4686be6).
⚠️ Report is 1 commits behind head on 0.6.

Additional details and impacted files
@@            Coverage Diff             @@
##              0.6     #783      +/-   ##
==========================================
+ Coverage   94.99%   95.07%   +0.08%     
==========================================
  Files          60       60              
  Lines        3434     3451      +17     
==========================================
+ Hits         3262     3281      +19     
+ Misses        172      170       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@meiravgri meiravgri requested a review from GuyAv46 September 18, 2025 15:18
@meiravgri meiravgri enabled auto-merge September 18, 2025 15:18
@meiravgri meiravgri added this pull request to the merge queue Sep 21, 2025
Merged via the queue into 0.6 with commit c10d1dc Sep 21, 2025
33 checks passed
@meiravgri meiravgri deleted the meiravg_resize_callback branch September 21, 2025 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants