feat: start recording index details in the mainifest, cache index type lookup #3131

westonpace · 2024-11-15T23:23:55Z

This addresses a specific problem. When a dataset had a scalar index on a string column we would perform I/O during the planning phase on every query that contained a filter. This added considerably latency (especially against S3) to query times.

We now cache that lookup.

It also starts to tackle a more central problem as well. Right now we our manifest stores very little information about indices (pretty much just the UUID). Any further information must be obtained by loading the index. This PR introduces the concept of "index details" which is a spot that an index can put index-specific (e.g. specific to btree or specific to bitmap) information that can be accessed during planning (by just looking at the manifest). At the moment this concept is still fairly bare bones but I think, as scalar indices become more sophisticated, this information can be useful.

If we decide we don't want it then I can pull it out as well and dial this PR back to just the caching component.

…pecific metadata that can be used at planning time to determine if an index should be applied (without paying the I/O cost to load the index).

wjones127 · 2024-11-15T23:36:45Z

We had discussed earlier some similar index changes proposed here:

lancedb/lancedb#1666

It looks like this is a good step in that direction by adding the index_config / index_details field 👍

codecov-commenter · 2024-11-15T23:59:02Z

Codecov Report

Attention: Patch coverage is 66.05505% with 37 lines in your changes missing coverage. Please review.

Project coverage is 77.90%. Comparing base (f257489) to head (e481fd4).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
rust/lance/src/index/scalar.rs	67.34%	12 Missing and 4 partials ⚠️
rust/lance/src/index/cache.rs	9.09%	10 Missing ⚠️
rust/lance/src/index.rs	59.09%	4 Missing and 5 partials ⚠️
rust/lance/src/dataset/scanner.rs	95.00%	1 Missing ⚠️
rust/lance/src/io/commit.rs	66.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3131      +/-   ##
==========================================
- Coverage   77.91%   77.90%   -0.01%     
==========================================
  Files         240      240              
  Lines       81564    81459     -105     
  Branches    81564    81459     -105     
==========================================
- Hits        63550    63464      -86     
- Misses      14806    14815       +9     
+ Partials     3208     3180      -28

Flag	Coverage Δ
unittests	`77.90% <66.05%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Start recording "index details" in the manifest. This will be index-s…

04057b7

…pecific metadata that can be used at planning time to determine if an index should be applied (without paying the I/O cost to load the index).

github-actions bot added enhancement New feature or request python labels Nov 15, 2024

Fix unit test

e481fd4

wjones127 approved these changes Nov 15, 2024

View reviewed changes

chebbyChefNEQ approved these changes Nov 16, 2024

View reviewed changes

westonpace merged commit a212395 into lancedb:main Nov 16, 2024
26 checks passed

chebbyChefNEQ mentioned this pull request Nov 18, 2024

perf: slow planning when filter is present #3127

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: start recording index details in the mainifest, cache index type lookup #3131

feat: start recording index details in the mainifest, cache index type lookup #3131

westonpace commented Nov 15, 2024

wjones127 commented Nov 15, 2024

codecov-commenter commented Nov 15, 2024

feat: start recording index details in the mainifest, cache index type lookup #3131

feat: start recording index details in the mainifest, cache index type lookup #3131

Conversation

westonpace commented Nov 15, 2024

wjones127 commented Nov 15, 2024

codecov-commenter commented Nov 15, 2024

Codecov Report