Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support transforming selected fragments in vector transform stage for ivf_pq index #2657

Merged
merged 9 commits into from
Aug 1, 2024

Conversation

raunaks13
Copy link
Contributor

@raunaks13 raunaks13 commented Jul 29, 2024

The transform_vectors() api transforms the ivf-pq indices to get an output table with the row_id, pq_code, and partition_id. This PR modifies this API so that this transformation is done for selected fragments that the user specifies as a parameter. If fragments=None, the default will use all fragments of the dataset.

@github-actions github-actions bot added enhancement New feature or request python labels Jul 29, 2024
Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a default so the parameters is optional.

python/python/lance/indices.py Outdated Show resolved Hide resolved
python/python/tests/test_indices.py Outdated Show resolved Hide resolved
python/src/fragment.rs Outdated Show resolved Hide resolved
@westonpace
Copy link
Contributor

I updated from main, let's merge if CI passes

@raunaks13 raunaks13 merged commit 7f956b5 into lancedb:main Aug 1, 2024
11 of 14 checks passed
eddyxu pushed a commit that referenced this pull request Aug 1, 2024
…ge for ivf_pq index (#2657)

The `transform_vectors()` api transforms the ivf-pq indices to get an
output table with the row_id, pq_code, and partition_id. This PR
modifies this API so that this transformation is done for selected
fragments that the user specifies as a parameter. If `fragments=None`,
the default will use all fragments of the dataset.
wjones127 added a commit that referenced this pull request Aug 1, 2024
We are close to having green CI on Lance `main`. test_vector_transform
is failing ever since we merged #2657

I think this is because we still have the file open but are re-using the
same file uri. We need to close it to overwrite it.
raunaks13 added a commit that referenced this pull request Aug 8, 2024
…#2681)

Follow up of PRs #2566, #2657 and #2670
The whole pipeline can help users transform, shuffle, and load IVF-PQ
indices in separate, standalone steps (can be useful when building
indices for large datasets)
First, the user runs `transform_vectors()` separately on selected
fragments
Then, they run `shuffle_transformed_vectors()` on each transformed
result
Finally, they load all the shuffled vectors and commit the index into
the dataset using `load_shuffled_vectors()`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants