Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix inversion method & pytorch Kmeans OOM #179

Merged
merged 4 commits into from
Mar 21, 2024

Conversation

bclavie
Copy link
Collaborator

@bclavie bclavie commented Mar 21, 2024

  • Fix an issue where the doc_id -> pid map was 1:1 instead of 1:many
  • Lower the threshold to monkey patch the CollectionEncoder to use PyTorch k-means to 75000 documents to avoid OOMs due to poor memory usage of the pytorch kmeans implementation.

@bclavie bclavie changed the title fix: fix inversion method fix: fix inversion method & pytorch Kmeans OOM Mar 21, 2024
@bclavie bclavie merged commit 796b493 into main Mar 21, 2024
2 checks passed
@bclavie bclavie deleted the fix/properly_generate_inverted_pid_docid_map branch March 21, 2024 23:21
@bclavie bclavie linked an issue Mar 21, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in _invert_pid_docid_map
1 participant