-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect iterator into chunks early #433
Conversation
Initially we used iterator because we wanted to lazily pull values from the database when Wasm requests the next item, so that e.g. early stop works as expected and doesn't read too much data. Unfortunately, we discovered that this leads to crashes when someone tries to iterate over the database and update/delete items inside the iteration since iteration itself held a lock. We added a workaround by collecting the iterator early into a Vec, and then returning a newly created iterator to the user. This fixed the issue, but at this point lazy iterator mechanism turned from an optimisation into pure overhead - if we want to collect Vec, we might as well do this early in the pipeline and avoid an extra dependency. This slightly improves iteration of empty tables, but otherwise the difference gets drowned by the speed of the iterator itself, so this is mainly code simplification. I still hope we can return to this and implement proper end-to-end lazy iteration in the future. This should be possible, since we already have branching so any updates inside iteration shouldn't affect iteration itself and both could happen without holding a global lock, but in the past this has proven difficult.
Seems like an interesting bit of use case for the mem-arch stuff to keep in mind. |
Sorry, not sure I understand - which part are you referring to that would be related to mem-arch? |
The goal of that effort is to redesign our memory architecture and think about when and where we copy. This is a somewhat performance sensitive place where we make tradeoffs around ease of implementation and memory used. Supporting the "proper end-to-end lazy iteration" (or deciding not to!) seems like a good thing for the memory architecture to have in mind. |
Ah yeah that makes sense. That said, the issue there was not so much with memory architecture, as with... idk, locking / forking architecture? My idea back when this issue first arose was that we should leverage our existing ability to branch out on edits, so that iteration can keep the existing branch alive and prevent it from getting flattened, while any updates that happen inside iteration are added to a new branch. However, there were some concerns and questions about whether updates inside iteration should be visible to iteration (personally I think the answer is no, and that's what we implement now with buffering anyway) and how to avoid taking the lock, so we never ended up implementing that. |
Not sure why openssl-src disappeared from Cargo.lock before, but here we go.
Description of Changes
Initially we used iterator because we wanted to lazily pull values from the database when Wasm requests the next item, so that e.g. early stop works as expected and doesn't read too much data.
Unfortunately, we discovered that this leads to crashes when someone tries to iterate over the database and update/delete items inside the iteration since iteration itself held a lock. We added a workaround by collecting the iterator early into a Vec, and then returning a newly created iterator to the user. This fixed the issue, but at this point lazy iterator mechanism turned from an optimisation into pure overhead - if we want to collect Vec, we might as well do this early in the pipeline and avoid an extra dependency.
This slightly improves iteration of empty tables as it only reduces allocation overhead per iterator, but otherwise the difference gets drowned by the speed of the iterator itself, so this is mainly code simplification.
I still hope we can return to this and implement proper end-to-end lazy iteration in the future. This should be possible, since we already have branching so any updates inside iteration shouldn't affect iteration itself and both could happen without holding a global lock, but in the past this has proven difficult.
Note: this PR will conflict with #420. I thought I'd wait until that one is merged, but looks like there are some reservations about the ABI-breaking change for now, so submitting them in parallel. Either way shouldn't be a difficult conflict resolution.
API and ABI
If the API is breaking, please state below what will break
Expected complexity level and risk
How complicated do you think these changes are? Grade on a scale from 1 to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex change.
This complexity rating applies not only to the complexity apparent in the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning ways.