Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: block property filter on tenant,table,index #93427

Open
jbowens opened this issue Dec 12, 2022 · 4 comments
Open

storage: block property filter on tenant,table,index #93427

jbowens opened this issue Dec 12, 2022 · 4 comments
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-storage Storage Team

Comments

@jbowens
Copy link
Collaborator

jbowens commented Dec 12, 2022

Within Pebble, keys from different Cockroach tenants, tables and indexes may all share sstables and sstable blocks. Different tenants, tables and indexes tend to have different access patterns. Imagine an infrequently-written, frequently-read tenant/table/index B, sandwiched within the keyspace by two frequently-written tenant/table/indexes A and C. Over time, it’s likely that the entirety of B’s data will reside in L6. However reads of B will frequently find sstables overlapping B’s keyspace, containing A and C’s writes. Iterators reading from B have no way of knowing that these sstables in higher levels contain no relevant data, so they’re forced to suffer the read amplification of the broader LSM.

One approach to this problem is sstable partitioning/guards (cockroachdb/pebble#517), but this can produce sstables that are smaller than desirable and increase key comparisons when seeking for the appropriate sstable.

An alternative is to define a block-property collector that collects the set of unique (tenant,table,index) tuples contained within ssblocks and sstables. An iterator reading from /<TenantID>/<TableID>/<IndexID> can set a block-property filter that ignores any blocks with a tenant-table-index block property that indicates it does not contain the sought index. If the iterator finds a sstable that overlaps the sought index's keyspace but does not contain the sought index, it can exclude the sstable without reading it. Cockroach iterators already set an upper bound of the start of the next index's keyspace, which would allow the iterator to exhaust the level without ever suffering a block load.

To reduce the overhead of this new block-property filtering, only sstables/blocks containing keys from more than one index must encode a block property.

Jira issue: CRDB-22329

@jbowens jbowens added A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Dec 12, 2022
@blathers-crl
Copy link

blathers-crl bot commented Dec 12, 2022

Hi @jbowens, please add a C-ategory label to your issue. Check out the label system docs.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@jbowens jbowens added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. labels Dec 12, 2022
@jbowens
Copy link
Collaborator Author

jbowens commented Feb 17, 2023

@sumeerbhola — do you have any thoughts on this idea? I hacked around during the most recent breather week, but I didn't get very far.

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Feb 28, 2023

This is an interesting idea -- it's a cheap "range" filter, that encodes the prefixes that are actually contained in a block.

  • How will we encode the set of /<TenantID>/<TableID>/<IndexID> tuples in a block cheaply (for the blocks that contain multiple of these)? Is the idea that the number of blocks that need this encoded are so few that we can afford a longer index block entry?
  • I suppose we will need to plumb from SQL the prefix into the RequestHeader for the mvcc scan, or better still the parsed TenantID, TableID, IndexID. Have you looked into how messy that is?

@jbowens
Copy link
Collaborator Author

jbowens commented Mar 1, 2023

How will we encode the set of /// tuples in a block cheaply (for the blocks that contain multiple of these)? Is the idea that the number of blocks that need this encoded are so few that we can afford a longer index block entry?

Yeah, I was hoping there were few enough of these that encoding it shouldn't be too bad. I'm not sure if it's worth the added read-time cpu cost, but I could imagine a prefix encoding could reduce the size since it's very likely that the tuples are adjacent. Also, Cockroach uses a varint encoding for each ID, so I think they're frequently relatively concise.

I suppose we will need to plumb from SQL the prefix into the RequestHeader for the mvcc scan, or better still the parsed TenantID, TableID, IndexID. Have you looked into how messy that is?

Haven't looked. In the prototype I was hacking on, iterator construction performed a key comparison on the iterator bounds to see if SQLPrefix(LowerBound).Next() >= UpperBound

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-storage Storage Team
Projects
Status: Backlog
Development

No branches or pull requests

2 participants