feat: coalesce scheduling of reads to speed up random access #2636

raunaks13 · 2024-07-24T16:36:37Z

Should fix #2629, addresses #1959

Earlier on randomly accessing rows, each request for a row was being scheduled separately, which increased overhead, especially on large datasets. This PR coalesces take scheduling when requests are within block_size distance from each other. The block size is determined based on the system.
The binary scheduler was also scheduling decoding of all indices individually. This updates the binary scheduler so that it schedules all offsets at once. These are then processed to determine which bytes to decode like before.
A script we can use to compare v1 vs v2 performance is added as test_random_access.py.

Specifically, on the lineitem dataset (same file from the issue above):

v1 query time: 0.12s
v2 query time (before): 2.8s.
v2 query time (after (1)): 0.54s.
v2 query time (after (1) and (2)): 0.02s.

wjones127

This is cool! Got a few questions though on how this will work.

protos/encodings.proto

rust/lance-io/src/scheduler.rs

wjones127 · 2024-07-25T20:39:01Z

rust/lance-io/src/scheduler.rs

+            for req in request.iter().skip(1) {
+                if is_close_together(&curr_interval, req, self.block_size) {
+                    curr_interval.end = curr_interval.end.max(req.end);
+                } else {


Doesn't this assume the requests are already kind of monotonic? Is that handled anywhere? Should we sort the requests by request.start before doing this?

Yes, I believe requests (ranges) are assumed to be sorted by a few encodings at this point. I don't think this is enforced anywhere though. We should probably make a note of wherever this sorting is assumed.
It may be better to sort the requests in the encoding itself (inside schedule_ranges()), before they are passed to submit_request() since then we won't have to call sort multiple times. In that case it may be better to leave this for another PR?
We could add a debug_assert!() though

I see. Looks like you are coalescing on the request side, and so they are all from the same array. If we put this one a different side, where they might be sourced from multiple decoders, it might be different. Seems fine for now, but wanted to note that seems to be an implicit assumption.

Yes, I broke coalescing up into two github issues: "in batch coalescing" which is cheap (we can assume all requests are for a single page of data and they arrive in sorted order) and "out of batch coalescing". I'm not sure we ever want to bother with "out of batch coalescing". You'd need some kind of sorted heap of requests and then you'd have to unsort them after the request is made. You'd also get weird things like what happens if you can coalesce part of a low priority request that's way back in the queue with a high priority request that's almost ready to run. You'd also need to make sure you aren't blocking the I/O threads at any point while you sort your queue. 💀

westonpace

Thanks for doing this, sorry it took a while to review. We will need to change the way we handle the offsets. However, the coalescing logic looks correct.

rust/lance-encoding/src/encodings/physical/binary.rs

protos/encodings.proto

westonpace · 2024-07-26T21:12:47Z

rust/lance-encoding/src/encodings/physical/binary.rs

+        // We schedule all the indices for decoding together
+        // This is more efficient compared to scheduling them one by one (reduces speed significantly for random access)
+        let indices_bytes = scheduler.submit_request(indices_byte_ranges, top_level_row);


I think you can combine these into one call and keep indices_scheduler. Just make a single call to indices_scheduler.schedule_ranges (passing in many ranges) instead of many calls to schedule_ranges (each passing in one range).

westonpace · 2024-07-26T21:18:57Z

rust/lance-io/src/scheduler.rs

+            for req in request.iter().skip(1) {
+                if is_close_together(&curr_interval, req, self.block_size) {
+                    curr_interval.end = curr_interval.end.max(req.end);
+                } else {


Yes, I broke coalescing up into two github issues: "in batch coalescing" which is cheap (we can assume all requests are for a single page of data and they arrive in sorted order) and "out of batch coalescing". I'm not sure we ever want to bother with "out of batch coalescing". You'd need some kind of sorted heap of requests and then you'd have to unsort them after the request is made. You'd also get weird things like what happens if you can coalesce part of a low priority request that's way back in the queue with a high priority request that's almost ready to run. You'd also need to make sure you aren't blocking the I/O threads at any point while you sort your queue. 💀

codecov-commenter · 2024-07-29T15:17:29Z

Codecov Report

Attention: Patch coverage is 95.12195% with 4 lines in your changes missing coverage. Please review.

Project coverage is 79.36%. Comparing base (7782eb9) to head (6537121).
Report is 3 commits behind head on main.

Files	Patch %	Lines
...st/lance-encoding/src/encodings/physical/binary.rs	93.54%	0 Missing and 2 partials ⚠️
rust/lance-io/src/scheduler.rs	96.07%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2636      +/-   ##
==========================================
+ Coverage   79.33%   79.36%   +0.03%     
==========================================
  Files         222      222              
  Lines       64584    64635      +51     
  Branches    64584    64635      +51     
==========================================
+ Hits        51236    51296      +60     
+ Misses      10360    10355       -5     
+ Partials     2988     2984       -4

Flag	Coverage Δ
unittests	`79.36% <95.12%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

westonpace

One minor nit, otherwise good to go

rust/lance-encoding/src/encodings/physical/binary.rs

raunaks13 added 4 commits July 22, 2024 23:00

optimize write perf for random access

54c8844

coalesced all scheduling in the binary decoding workflow

c63b565

clippy

d8f58bd

removed extra test

3cdecf2

github-actions bot added the enhancement New feature or request label Jul 24, 2024

added a benchmark

ad4153e

github-actions bot added the python label Jul 24, 2024

raunaks13 added 3 commits July 24, 2024 16:44

license header

f38e0b4

Merge branch 'main' of github.com:raunaks13/lance into debug

b550b36

removed print statements

ef9d4ba

raunaks13 added the rust Rust related tasks label Jul 24, 2024

removed more print statements

cea66d1

raunaks13 requested review from westonpace and wjones127 July 24, 2024 16:54

raunaks13 changed the title ~~feat: coalesce scheduling of take operations to speed up random access~~ feat: coalesce scheduling of reads to speed up random access Jul 24, 2024

wjones127 reviewed Jul 25, 2024

View reviewed changes

westonpace requested changes Jul 26, 2024

View reviewed changes

raunaks13 added 2 commits July 29, 2024 05:11

addressed comments

e219c57

merge with main

6537121

raunaks13 requested review from westonpace and wjones127 July 29, 2024 15:00

westonpace approved these changes Jul 29, 2024

View reviewed changes

rust/lance-encoding/src/encodings/physical/binary.rs Outdated Show resolved Hide resolved

fix

2a1951d

raunaks13 merged commit 93c81d2 into lancedb:main Jul 29, 2024
21 of 22 checks passed

westonpace mentioned this pull request Sep 11, 2024

Add in-batch coalescing #1959

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: coalesce scheduling of reads to speed up random access #2636

feat: coalesce scheduling of reads to speed up random access #2636

raunaks13 commented Jul 24, 2024 •

edited

Loading

wjones127 left a comment

wjones127 Jul 25, 2024

raunaks13 Jul 25, 2024 •

edited

Loading

wjones127 Jul 25, 2024

westonpace Jul 26, 2024

westonpace left a comment

westonpace Jul 26, 2024

westonpace Jul 26, 2024

codecov-commenter commented Jul 29, 2024

westonpace left a comment

feat: coalesce scheduling of reads to speed up random access #2636

feat: coalesce scheduling of reads to speed up random access #2636

Conversation

raunaks13 commented Jul 24, 2024 • edited Loading

wjones127 left a comment

Choose a reason for hiding this comment

wjones127 Jul 25, 2024

Choose a reason for hiding this comment

raunaks13 Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

wjones127 Jul 25, 2024

Choose a reason for hiding this comment

westonpace Jul 26, 2024

Choose a reason for hiding this comment

westonpace left a comment

Choose a reason for hiding this comment

westonpace Jul 26, 2024

Choose a reason for hiding this comment

westonpace Jul 26, 2024

Choose a reason for hiding this comment

codecov-commenter commented Jul 29, 2024

Codecov Report

westonpace left a comment

Choose a reason for hiding this comment

raunaks13 commented Jul 24, 2024 •

edited

Loading

raunaks13 Jul 25, 2024 •

edited

Loading