Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent overflow at kmeans #287

Merged
merged 2 commits into from
Jan 22, 2024
Merged

Conversation

cutecutecat
Copy link
Member

Thanks to @xieydd , we have digged out this bug.

Panickied. Info: PanicInfo { payload: Any { .. }, 
message: Some(attempt to subtract with overflow), 
location: Location { file: "crates/service/src/algorithms/clustering/[elkan_k_means.rs](http://elkan_k_means.rs/)", line: 179, col: 47 }, 
can_unwind: true, force_no_backtrace: false }

If there is too few vectors with a small count n and larger nlist as c, it will be a overflow there.

@usamoi
Copy link
Collaborator

usamoi commented Jan 19, 2024

should we just return centroids using a simpler algorithm if n <= c?

@cutecutecat
Copy link
Member Author

cutecutecat commented Jan 19, 2024

should we just return centroids using a simpler algorithm if n <= c?

Fine, I will try QuickCenters by https://github.com/pgvector/pgvector/blob/cc9e6a67783092ae2e8b1b1ade552e547bef8934/src/ivfkmeans.c#L117

Different from pgvector implement, we don't exclude duplicates.

@usamoi
Copy link
Collaborator

usamoi commented Jan 19, 2024

QuickCenters is already in the code.

Signed-off-by: cutecutecat <junyuchen@tensorchord.ai>
Signed-off-by: cutecutecat <junyuchen@tensorchord.ai>
@cutecutecat cutecutecat added this pull request to the merge queue Jan 22, 2024
Merged via the queue into tensorchord:main with commit c95f99f Jan 22, 2024
7 checks passed
@cutecutecat cutecutecat deleted the kmeans-fix branch February 25, 2024 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants