Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect iterator into chunks early #433

Merged
merged 3 commits into from
Oct 16, 2023
Merged

Collect iterator into chunks early #433

merged 3 commits into from
Oct 16, 2023

Conversation

RReverser
Copy link
Member

Description of Changes

Initially we used iterator because we wanted to lazily pull values from the database when Wasm requests the next item, so that e.g. early stop works as expected and doesn't read too much data.

Unfortunately, we discovered that this leads to crashes when someone tries to iterate over the database and update/delete items inside the iteration since iteration itself held a lock. We added a workaround by collecting the iterator early into a Vec, and then returning a newly created iterator to the user. This fixed the issue, but at this point lazy iterator mechanism turned from an optimisation into pure overhead - if we want to collect Vec, we might as well do this early in the pipeline and avoid an extra dependency.

This slightly improves iteration of empty tables as it only reduces allocation overhead per iterator, but otherwise the difference gets drowned by the speed of the iterator itself, so this is mainly code simplification.

I still hope we can return to this and implement proper end-to-end lazy iteration in the future. This should be possible, since we already have branching so any updates inside iteration shouldn't affect iteration itself and both could happen without holding a global lock, but in the past this has proven difficult.

Note: this PR will conflict with #420. I thought I'd wait until that one is merged, but looks like there are some reservations about the ABI-breaking change for now, so submitting them in parallel. Either way shouldn't be a difficult conflict resolution.

API and ABI

  • This is a breaking change to the module ABI
  • This is a breaking change to the module API
  • This is a breaking change to the ClientAPI
  • This is a breaking change to the SDK API

If the API is breaking, please state below what will break

Expected complexity level and risk

How complicated do you think these changes are? Grade on a scale from 1 to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex change.

This complexity rating applies not only to the complexity apparent in the diff,
but also to its interactions with existing and future code.

If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning ways.

Initially we used iterator because we wanted to lazily pull values from the database when Wasm requests the next item, so that e.g. early stop works as expected and doesn't read too much data.

Unfortunately, we discovered that this leads to crashes when someone tries to iterate over the database and update/delete items inside the iteration since iteration itself held a lock. We added a workaround by collecting the iterator early into a Vec, and then returning a newly created iterator to the user. This fixed the issue, but at this point lazy iterator mechanism turned from an optimisation into pure overhead - if we want to collect Vec, we might as well do this early in the pipeline and avoid an extra dependency.

This slightly improves iteration of empty tables, but otherwise the difference gets drowned by the speed of the iterator itself, so this is mainly code simplification.

I still hope we can return to this and implement proper end-to-end lazy iteration in the future. This should be possible, since we already have branching so any updates inside iteration shouldn't affect iteration itself and both could happen without holding a global lock, but in the past this has proven difficult.
@kulakowski
Copy link
Contributor

Seems like an interesting bit of use case for the mem-arch stuff to keep in mind.

@RReverser
Copy link
Member Author

RReverser commented Oct 16, 2023

Seems like an interesting bit of use case for the mem-arch stuff to keep in mind.

Sorry, not sure I understand - which part are you referring to that would be related to mem-arch?

@kulakowski
Copy link
Contributor

Seems like an interesting bit of use case for the mem-arch stuff to keep in mind.

Sorry, not sure I understand - which part are you referring to that would be related to mem-arch?

The goal of that effort is to redesign our memory architecture and think about when and where we copy. This is a somewhat performance sensitive place where we make tradeoffs around ease of implementation and memory used. Supporting the "proper end-to-end lazy iteration" (or deciding not to!) seems like a good thing for the memory architecture to have in mind.

@RReverser
Copy link
Member Author

Ah yeah that makes sense.

That said, the issue there was not so much with memory architecture, as with... idk, locking / forking architecture?

My idea back when this issue first arose was that we should leverage our existing ability to branch out on edits, so that iteration can keep the existing branch alive and prevent it from getting flattened, while any updates that happen inside iteration are added to a new branch.

However, there were some concerns and questions about whether updates inside iteration should be visible to iteration (personally I think the answer is no, and that's what we implement now with buffering anyway) and how to avoid taking the lock, so we never ended up implementing that.

@RReverser RReverser requested a review from kulakowski October 16, 2023 17:36
@RReverser RReverser enabled auto-merge (squash) October 16, 2023 18:04
Not sure why openssl-src disappeared from Cargo.lock before, but here we go.
@RReverser RReverser merged commit 6715b4c into master Oct 16, 2023
5 checks passed
@RReverser RReverser deleted the iter-chunked-writer branch October 16, 2023 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants