Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: allow fragment scan for nearest query if there is a prefilter #2631

Merged
merged 3 commits into from
Jul 23, 2024

Conversation

jiachengdb
Copy link
Contributor

We want to create a query plan that first runs prefiltering on selected fragments.

Then for rows passing the prefilter, we probe the ANN index with filter_row_ids.

In this plan, the returned result rows will only come from the selected fragments, so there will be no violation to the query semantics.

@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 79.15%. Comparing base (782350e) to head (44eb5e4).

Files Patch % Lines
rust/lance/src/dataset/scanner.rs 66.66% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2631      +/-   ##
==========================================
+ Coverage   79.13%   79.15%   +0.01%     
==========================================
  Files         218      218              
  Lines       63615    63617       +2     
  Branches    63615    63617       +2     
==========================================
+ Hits        50344    50354      +10     
+ Misses      10324    10322       -2     
+ Partials     2947     2941       -6     
Flag Coverage Δ
unittests 79.15% <66.66%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work today for filters on indexed columns. For example, if the filter is my_uuid == 17 or my_uuid != 17 and there is a scalar index on my_uuid. In this case the prefilter will be an allow list or block list with one row id.

However, we should be able to enhance the indexed prefilter logic to apply an allow list or block list based on excluded fragments.

@jiachengdb
Copy link
Contributor Author

Got it.

Is there an easy way to extend the validation logic in this PR to only allow fragment scan if the leaf is a FileScan?

@jiachengdb
Copy link
Contributor Author

@westonpace I added another check on the index scan to disallow the case you mentioned. PTAL.

Comment on lines 838 to 841
// Index scan doesn't honor the fragment allowlist today.
// TODO: we could filter the index scan results to only include the allowed fragments.
self.ensure_not_fragment_scan()?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to this spot (it's line 1539 on main right now I think):

            (Some(index_query), None, true) => {
                // The filter is completely satisfied by the index.  We
                // only need to search the index to determine the valid row
                // ids.
                let index_query = Arc::new(ScalarIndexExec::new(
                    self.dataset.clone(),
                    index_query.clone(),
                ));
                PreFilterSource::ScalarIndexQuery(index_query)
            }

The only node that doesn't handle a fragment scan is the ScalarIndexQuery node. So right here is a bit too restrictive. If we do that then we're good to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to line 1539. PTAL.

Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bearing with me.

@westonpace
Copy link
Contributor

@jiachengdb

I'm going to merge this but if you find time can you add a PR with a python unit test covering this case?

@westonpace westonpace merged commit 501a642 into lancedb:main Jul 23, 2024
22 checks passed
@jiachengdb
Copy link
Contributor Author

@jiachengdb

I'm going to merge this but if you find time can you add a PR with a python unit test covering this case?

Let me find time to add the test.

@jiachengdb
Copy link
Contributor Author

@jiachengdb
I'm going to merge this but if you find time can you add a PR with a python unit test covering this case?

Let me find time to add the test.

@westonpace #2638

westonpace pushed a commit that referenced this pull request Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants