Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: maintain L0 SST index in compute nodes #4072

Open
hzxa21 opened this issue Jul 21, 2022 · 7 comments
Open

storage: maintain L0 SST index in compute nodes #4072

hzxa21 opened this issue Jul 21, 2022 · 7 comments

Comments

@hzxa21
Copy link
Collaborator

hzxa21 commented Jul 21, 2022

Background

In our system, the reader of the internal states is always the writer of the state. Currently, the storage LSM read path of internal states is unaware of this assumption. Due to the fact that L0 SSTs can overlap with each other and we use key range to filter SSTs, CN may end up fetching unused L0 SSTs on reads:

  1. State of the same table_id (i.e. key prefix) can be in different L0 SSTs since we can place different shards of an operator onto different CNs. In this case, a shard will hit many overlapping L0 SSTs on reads but in fact it only needs to read its own portion of L0 SSTs.
  2. State written by the same CN is batched into the same L0 SSTs, causing L0 SSTs to have wide key range. Therefore, only using key range to filter L0 SSTs can generate many false positives.

To solve this problem, we need to maintain some kind of L0 SST index to leverage the "reader of the internal states is always the writer of the state" assumption to improve L0 SST filtering.

Solution

After discussion with @skyzh, @st1page , @wenym1 and @Li0k, we have come up with some ideas:

  • CN stores the sst_id of the L0 SSTs it has uploaded to S3. We call it LocalL0SSTs.
  • On reads, we use the "intersection of LocalL0SSTs and L0 SSTs in HummockVersion" + L1-N SSTs in HummockVersion to construct a local version for reads. Existing SST filtering logic still applies.
  • On HummockVersion update, we update LocalL0SSTs by removing the deleted SSTs (due to compaction). Thanks to incremental version fetch, we can easily identify the deleted SSTs.

Under this solution, we will only use the full list of L0 SSTs in HummockVersion when LocalL0SSTs is absent: 1) failover, 2) batch query (if batch reader is not co-scheduled with streaming executors)

Further optimizations

  • L0->L0 compaction: we may prefer to constrain the input SSTs of L0->L0 compaction to be LocalL0SSTs to reduce read amplification on L0 SSTs. That means CN needs to trigger L0->L0 compaction providing LocalL0SSTs information to compaction manager. Also, I am not sure how this can be integrated with the sub-level design. Need further investigation.
  • LocalL0SSTs data structure: flat or map?
@skyzh
Copy link
Contributor

skyzh commented Jul 21, 2022

Note that wait_epoch will now need to download LocalL0SSTs. Need to take special care of the batch read process.

@github-actions
Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

@hzxa21 hzxa21 added this to the release-0.1.15 milestone Dec 1, 2022
@hzxa21
Copy link
Collaborator Author

hzxa21 commented Dec 1, 2022

@Li0k Can you help port this to the new state store implementation?

@fuyufjh
Copy link
Member

fuyufjh commented Aug 8, 2023

Hi, folks, any updates on this?

@Li0k
Copy link
Contributor

Li0k commented Aug 9, 2023

Hi, folks, any updates on this?

cc @Little-Wallace

@wenym1 wenym1 modified the milestones: release-1.3, release-1.4 Oct 10, 2023
@hzxa21
Copy link
Collaborator Author

hzxa21 commented Nov 8, 2023

We have a new RFC proposed. I will remove this issue from milestone and re-add it when the RFC is finialized.

@hzxa21 hzxa21 removed this from the release-1.4 milestone Nov 8, 2023
@soundOfDestiny
Copy link
Contributor

We have a new RFC proposed. I will remove this issue from milestone and re-add it when the RFC is finialized.

ok while I still think flat is better than map for LocalL0SSTs data structure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants