-
Notifications
You must be signed in to change notification settings - Fork 22
[0.8] [MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW #777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## 0.8 #777 +/- ##
==========================================
+ Coverage 96.87% 96.90% +0.02%
==========================================
Files 91 91
Lines 5082 5131 +49
==========================================
+ Hits 4923 4972 +49
Misses 159 159 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
src/VecSim/vec_sim_tiered_index.h
Outdated
| jobs.size()); | ||
| } | ||
|
|
||
| #ifdef BUILD_TESTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentionally wrapping the public: with #ifdef? Seems like a bug. Consider wrapping the new function only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidentally backported from master. Ill align with the current version approach as you suggest.
Not sure its a bug because we don't have a SA object of VecSimTieredIndex
| if (curElementCount % this->blockSize == 0) { | ||
| shrinkByBlock(); | ||
| } | ||
| shrinkByBlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move the condition into the function? Consider renaming so it's clear it doesn't necessarily shrink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it in main, i think that it was required for the initial implementation and then i forgot to revert
Should i keep it aligned with main or revert here?
| } | ||
|
|
||
| template <typename DataType, typename DistType> | ||
| void HNSWIndex<DataType, DistType>::resizeIndexCommon(size_t new_max_elements) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like now, new_max_elements is equal to maxElements already. Can we avoid passing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxElements is not always aligned with size of meta data containers.
for example:
// insert 3 * bs vecs. maxElements: 3 * bs. metadata containers size: 3 * bs.
// remove 1 * bs. maxElements: 2 * bs. metadata containers size: 3 * bs (no resize)
// remove another bs. maxElements: 1 * bs. metadata containers size: 2 * bs (resizes)
| /******************** Implementation **************/ | ||
|
|
||
| template <typename DataType, typename DistType> | ||
| void BruteForceIndex<DataType, DistType>::appendVector(const void *vector_data, labelType label) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were any of the changes in this function necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplifies the resize logic (avoiding -1 offset calculations) and now also aligned with main
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 0.6
git worktree add -d .worktree/backport-777-to-0.6 origin/0.6
cd .worktree/backport-777-to-0.6
git switch --create backport-777-to-0.6
git cherry-pick -x 2f87813b0a0b4e08602d29816d3a92964f34e776 0ebc8b371952394af30156cc6f8caead6d58c5bd b70e4d2268280f4c9d7ea9c4eb4e4954867750b5 |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 0.7
git worktree add -d .worktree/backport-777-to-0.7 origin/0.7
cd .worktree/backport-777-to-0.7
git switch --create backport-777-to-0.7
git cherry-pick -x 2f87813b0a0b4e08602d29816d3a92964f34e776 0ebc8b371952394af30156cc6f8caead6d58c5bd b70e4d2268280f4c9d7ea9c4eb4e4954867750b5 |
Purpose
This PR backports the resize logic improvements from PR #753 to the 0.8 branch.
The original change decouples shrinking and growing operations in vector index algorithms to prevent oscillating allocation/deallocation cycles during index updates at block boundaries, which was particularly problematic for large containers like hash tables and metadata vectors.
Behavior Changes in 0.8 Branch
Before this PR (0.8 branch):
count % blockSize == 0during vector removalAfter this PR (0.8 branch):
indexSize() == indexCapacity()(capacity is full)indexCapacity() >= (indexSize() + 2 * blockSize))Key Differences from Main Branch
The main branch implementation differs from this 0.8 backport in several important ways due to initial capacity support in the 0.8 branch:
Initial Capacity Support:
initialCapacityparameter, allowing pre-allocation of index capacityShrinking to Zero Logic:
Resize Policy:
Block Management:
When using large initial capacity values, this implementation may still trigger frequent metadata container resizes during update-heavy workloads.
Example scenario:
Initial capacity: 10M elements (10,000 blocks)
Current size: 7M elements
Update operations (remove + insert): Each removal can trigger shrinking since capacity >= (size + 2*blockSize) remains true for thousands of operations