-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add backpressure to v2 I/O scheduler #2683
Conversation
1867eef
to
1e6bef9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems well thought out. My only concerns will be whether some of the warnings will be noisy, but I think in at least one of the cases you've handled that well with the 60s timer.
rust/lance-encoding/src/decoder.rs
Outdated
EMITTED_BATCH_SIZE_WARNING.call_once(|| { | ||
let size_mb = size_bytes / 1024 / 1024; | ||
warn!("Lance read in a single batch that contained more than {}MiB of data. You may want to consider reducing the batch size.", size_mb); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you worry this could be noisy? Would someone with image / video datasets always hit this?
Also, does this ever reset? To try again with a smaller batch size and see if they get this warning, will they need to restart the process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you worry this could be noisy? Would someone with image / video datasets always hit this?
10MiB is pretty extreme but yes, someone with a video dataset would encounter this even if there was only one video. Perhaps I will only emit the warning if the batch size is greater than 32 (just some arbitrary threshold).
Also, does this ever reset? To try again with a smaller batch size and see if they get this warning, will they need to restart the process?
Hmm, I was thinking "once per scan" but, on reflection, it seems I ended up with "once per process". I will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to once-per-scan.
fbc8d0c
to
56136cf
Compare
…eduler loop. This ensures that backpressure is acquired in priority order which should help avoid deadlock in the very large range case
56136cf
to
6651413
Compare
Adds backpressure to the I/O scheduler. The general problem is rather tricky. If your I/O threads all pause because your I/O buffer is full and then your decode threads are waiting for queued I/O tasks then you will have a deadlock.
I/O priority should allow us to avoid this situation but it is something that will need to be monitored, especially if users want to set really small I/O buffer sizes. For this reason I haven't made any of the new settings configurable (except for deadlock detection which we may turn on for debugging purposes if people seem to have encountered a deadlock).
One way you can hit this deadlock is to create a file with 10 pages where each page has 10GB of data. Even a single read will fill up the I/O buffer. We submit the reads in priority order but we have 8 I/O threads and so they race to grab the permits.
As a result I've added a much needed option to split primitive arrays into multiple pages if we are given huge chunks of data. The splitting algorithm is not perfect and could also use some work (perhaps we can do a sort of "binary splitting" where we continuously split the largest chunk in half until all chunks are below a given size)
At the moment I think things are "safe enough" that this PR prevents more problems (avoids OOM) than it introduces (deadlock in esoteric cases).