Skip to content

Conversation

@msokolov
Copy link
Contributor

@msokolov msokolov commented Dec 3, 2025

This is what I was thinking -- when "repairing" only add links from the node being repaired and not back again. This is needed because ordinarily we are adding nodes in order and each node is only added once, so there is no opportunity for duplication, but in this case we are effectively "re-adding" the node. So we would have to check for duplicates when making links, or we can just forgo adding the reverse links. Is this good enough? I don't know, I haven't had time to do any extensive testing, but it at least gets rid of the duplicates.

@msokolov msokolov added the skip-changelog Apply to PRs that don't need a changelog entry, stopping the automated changelog check. label Dec 3, 2025
popToScratch(candidates, scratchArray);

// Add diverse neighbors and establish bidirectional connections
addDiverseNeighbors(targetLevel, node, scratchArray, scorer, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep addDiverseNeighbors here? This is also used during addition of new node to a level (rebalancing step).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I'll add this back! I wonder if we should do at least a quick test to make sure this change isn't harming recall too much in the heavy deletion case, but I'm unsure how you ran the tests. I guess it was with luceneutil? Did you commit any changes to thatin support of the testing?

Copy link
Contributor

@Pulkitg64 Pulkitg64 Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good idea. Let me check the recall with these changes.

Did you commit any changes to thatin support of the testing?

No, not yet. Will try to raise PR for adding deletes support in knnPerfTest in some time.

@Pulkitg64
Copy link
Contributor

Thanks @msokolov for catching and fixing this issue. I should have been more careful in my PR.

@Pulkitg64
Copy link
Contributor

We are seeing recall drop with this code fix:

I indexed 100k docs in a single segment (using force-merge), then deleted 40% random docs and then again performed force-merge and then computed recall. Here is the recall results between baseline and candidate and we are seeing recall drop consistently across all maxConn values:

Experiment Recall Result
MaxConn Baseline Candidate Recall Drop
8 81.20% 78.20% 3.00%
16 90.60% 86.90% 3.70%
32 93.40% 90.40% 3.00%
64 94.00% 91.40% 2.60%

Next: Running more tests with different delete %.

@Pulkitg64
Copy link
Contributor

I have raised another PR with brute force approach to fix the test: #15478. In that approach, I am not seeing any recall regression but seeing around 2% regression in indexing rate. Please let me know your thoughts if we should pursue that approach or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:core/hnsw skip-changelog Apply to PRs that don't need a changelog entry, stopping the automated changelog check.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants