[0.8] [MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW #777

meiravgri · 2025-09-15T11:00:02Z

Purpose

This PR backports the resize logic improvements from PR #753 to the 0.8 branch.
The original change decouples shrinking and growing operations in vector index algorithms to prevent oscillating allocation/deallocation cycles during index updates at block boundaries, which was particularly problematic for large containers like hash tables and metadata vectors.

Behavior Changes in 0.8 Branch

Before this PR (0.8 branch):

Shrinking occurred whenever count % blockSize == 0 during vector removal
Metadata containers were immediately resized down by exactly one block size
No buffer zone existed between growing and shrinking operations

After this PR (0.8 branch):

Growing: Triggered when indexSize() == indexCapacity() (capacity is full)
Shrinking: Only when there are 2+ free blocks (indexCapacity() >= (indexSize() + 2 * blockSize))
Buffer Zone: Maintains at least 1 block buffer to prevent oscillation
Special handling: Always shrinks by exactly one block, with special condition for the last block

Key Differences from Main Branch

The main branch implementation differs from this 0.8 backport in several important ways due to initial capacity support in the 0.8 branch:

Initial Capacity Support:
- 0.8 branch: Supports initialCapacity parameter, allowing pre-allocation of index capacity
- Main branch: No initial capacity support (deprecated)
Shrinking to Zero Logic:
- 0.8 branch: Always shrinks by block size with special condition: "when capacity equals one block size, shrink to zero"
- Main branch: Immediately shrinks to 0 when index size becomes 0
Resize Policy:
- 0.8 branch: Maintains "always remove one block" guarantee to align with initial capacity behavior
- Main branch: Uses simpler "shrink to 0 when size=0" logic since no initial capacity exists
Block Management:
- 0.8 branch: Can have more than two free blocks due to initial capacity pre-allocation
- Main branch: Simpler block management without initial capacity considerations

⚠️ Disclaimer: Potential Heavy Resize Sequence with Large Initial Capacity

When using large initial capacity values, this implementation may still trigger frequent metadata container resizes during update-heavy workloads.

Example scenario:

Initial capacity: 10M elements (10,000 blocks)
Current size: 7M elements
Update operations (remove + insert): Each removal can trigger shrinking since capacity >= (size + 2*blockSize) remains true for thousands of operations

shrink by blocksize shrink to zero only if capcity is 1 blocksize.

codecov · 2025-09-15T11:26:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.90%. Comparing base (3a7ec14) to head (b70e4d2).
⚠️ Report is 1 commits behind head on 0.8.

Additional details and impacted files

@@            Coverage Diff             @@
##              0.8     #777      +/-   ##
==========================================
+ Coverage   96.87%   96.90%   +0.02%     
==========================================
  Files          91       91              
  Lines        5082     5131      +49     
==========================================
+ Hits         4923     4972      +49     
  Misses        159      159

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

GuyAv46 · 2025-09-15T12:06:00Z

src/VecSim/vec_sim_tiered_index.h

                                jobs.size());
    }

+#ifdef BUILD_TESTS


Intentionally wrapping the public: with #ifdef? Seems like a bug. Consider wrapping the new function only

Accidentally backported from master. Ill align with the current version approach as you suggest.
Not sure its a bug because we don't have a SA object of VecSimTieredIndex

GuyAv46 · 2025-09-15T12:16:04Z

src/VecSim/algorithms/hnsw/hnsw.h

-    if (curElementCount % this->blockSize == 0) {
-        shrinkByBlock();
-    }
+    shrinkByBlock();


Why move the condition into the function? Consider renaming so it's clear it doesn't necessarily shrink

I did it in main, i think that it was required for the initial implementation and then i forgot to revert
Should i keep it aligned with main or revert here?

GuyAv46 · 2025-09-15T12:19:46Z

src/VecSim/algorithms/hnsw/hnsw.h

 }

 template <typename DataType, typename DistType>
 void HNSWIndex<DataType, DistType>::resizeIndexCommon(size_t new_max_elements) {


Seems like now, new_max_elements is equal to maxElements already. Can we avoid passing it?

maxElements is not always aligned with size of meta data containers.
for example:
// insert 3 * bs vecs. maxElements: 3 * bs. metadata containers size: 3 * bs.
// remove 1 * bs. maxElements: 2 * bs. metadata containers size: 3 * bs (no resize)
// remove another bs. maxElements: 1 * bs. metadata containers size: 2 * bs (resizes)

GuyAv46 · 2025-09-15T12:30:43Z

src/VecSim/algorithms/brute_force/brute_force.h

 /******************** Implementation **************/

 template <typename DataType, typename DistType>
 void BruteForceIndex<DataType, DistType>::appendVector(const void *vector_data, labelType label) {


Were any of the changes in this function necessary?

Simplifies the resize logic (avoiding -1 offset calculations) and now also aligned with main

github-actions · 2025-09-15T16:31:51Z

Backport failed for 0.6, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 0.6
git worktree add -d .worktree/backport-777-to-0.6 origin/0.6
cd .worktree/backport-777-to-0.6
git switch --create backport-777-to-0.6
git cherry-pick -x 2f87813b0a0b4e08602d29816d3a92964f34e776 0ebc8b371952394af30156cc6f8caead6d58c5bd b70e4d2268280f4c9d7ea9c4eb4e4954867750b5

github-actions · 2025-09-15T16:31:52Z

Backport failed for 0.7, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 0.7
git worktree add -d .worktree/backport-777-to-0.7 origin/0.7
cd .worktree/backport-777-to-0.7
git switch --create backport-777-to-0.7
git cherry-pick -x 2f87813b0a0b4e08602d29816d3a92964f34e776 0ebc8b371952394af30156cc6f8caead6d58c5bd b70e4d2268280f4c9d7ea9c4eb4e4954867750b5

…ontainers in Flat and HNSW (#777) * backport #753 shrink by blocksize shrink to zero only if capcity is 1 blocksize. * move public outside * revert size (cherry picked from commit bcc4d67)

…ontainers in Flat and HNSW (#780) [0.8] [MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW (#777) * backport #753 shrink by blocksize shrink to zero only if capcity is 1 blocksize. * move public outside * revert size (cherry picked from commit bcc4d67)

backport #753

2f87813

shrink by blocksize shrink to zero only if capcity is 1 blocksize.

meiravgri changed the title ~~backport #753~~ [0.8] [MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW Sep 15, 2025

meiravgri added bm-basics-fp32-single bm-basics-fp32-multi labels Sep 15, 2025

meiravgri requested a review from GuyAv46 September 15, 2025 11:37

GuyAv46 reviewed Sep 15, 2025

View reviewed changes

move public outside

0ebc8b3

meiravgri requested a review from GuyAv46 September 15, 2025 15:02

meiravgri enabled auto-merge September 15, 2025 15:03

GuyAv46 previously approved these changes Sep 15, 2025

View reviewed changes

GuyAv46 disabled auto-merge September 15, 2025 15:08

revert size

b70e4d2

meiravgri dismissed GuyAv46’s stale review via b70e4d2 September 15, 2025 15:08

GuyAv46 approved these changes Sep 15, 2025

View reviewed changes

meiravgri enabled auto-merge September 15, 2025 15:30

meiravgri added this pull request to the merge queue Sep 15, 2025

meiravgri added backport 0.6 backport 0.7 labels Sep 15, 2025

Merged via the queue into 0.8 with commit bcc4d67 Sep 15, 2025
28 checks passed

meiravgri deleted the backport-meiravg_relax_resize-0.8 branch September 15, 2025 16:31

meiravgri mentioned this pull request Sep 16, 2025

[0.7] [MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW #780

Merged

meiravgri mentioned this pull request Sep 18, 2025

[0.6] [MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW #783

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[0.8] [MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW #777

[0.8] [MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW #777

Uh oh!

meiravgri commented Sep 15, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 15, 2025 •

edited

Loading

Uh oh!

GuyAv46 Sep 15, 2025

Uh oh!

meiravgri Sep 15, 2025

Uh oh!

GuyAv46 Sep 15, 2025

Uh oh!

meiravgri Sep 15, 2025 •

edited

Loading

Uh oh!

GuyAv46 Sep 15, 2025

Uh oh!

meiravgri Sep 15, 2025

Uh oh!

GuyAv46 Sep 15, 2025

Uh oh!

meiravgri Sep 15, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[0.8] [MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW #777

[0.8] [MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW #777

Uh oh!

Conversation

meiravgri commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Behavior Changes in 0.8 Branch

Key Differences from Main Branch

⚠️ Disclaimer: Potential Heavy Resize Sequence with Large Initial Capacity

Example scenario:

Uh oh!

codecov bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

GuyAv46 Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

meiravgri Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

GuyAv46 Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

meiravgri Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GuyAv46 Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

meiravgri Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

GuyAv46 Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

meiravgri Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

meiravgri commented Sep 15, 2025 •

edited

Loading

codecov bot commented Sep 15, 2025 •

edited

Loading

meiravgri Sep 15, 2025 •

edited

Loading