You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During a sequential import, CockroachDB ingests sstables with successively increasing key ranges. As the import ingests sstables with increasing key ranges and compactions move their keys down the LSM, there's nothing preventing compactions from outputting the new keys into the same sstables as keys after the import's keyspace. These sstables spanning the import keyspace force future ingested sstables into L0. One solution is an explicit guard (#517).
I've also been wanting to experiment with a compaction output–splitting heuristic based on the distribution of sequence numbers in the compaction inputs. Ingests create large, dense swaths of keys with the same sequence number. We could adjust the splitting heuristic to watch for a streak of N keys with the same sequence number. Compactions could split outputs early at the point when the sequence-number streak ends, as long as the current output is not too small. This would encourage splits at the right-hand side of ingested sstables, which then provide a gap in sstables boundaries for future ingested sstables as a part of the same sequential import.
Once keys reach L6 their sequence numbers are zeroed, so this would have little effect in L6 other than encouraging splitting keys with snapshot-preserved sequence numbers from non-snapshot-preserved sequence numbers.
The text was updated successfully, but these errors were encountered:
yeah, this ingest problem would also be solved. The virtual sstable thing has some cost, eg, potential of space amplification from thin virtual tables backed by wide physical sstables. Not sure how much of a problem that will be in practice, but sstable-splitting heuristics seem like they could help circumvent it if they're effective.
Going to close this out since we're shipping virtual sstables in 23.2. If in practice we see space amp from vritual sstables as a problem, it might be worth re-examining then.
During a sequential import, CockroachDB ingests sstables with successively increasing key ranges. As the import ingests sstables with increasing key ranges and compactions move their keys down the LSM, there's nothing preventing compactions from outputting the new keys into the same sstables as keys after the import's keyspace. These sstables spanning the import keyspace force future ingested sstables into L0. One solution is an explicit guard (#517).
I've also been wanting to experiment with a compaction output–splitting heuristic based on the distribution of sequence numbers in the compaction inputs. Ingests create large, dense swaths of keys with the same sequence number. We could adjust the splitting heuristic to watch for a streak of N keys with the same sequence number. Compactions could split outputs early at the point when the sequence-number streak ends, as long as the current output is not too small. This would encourage splits at the right-hand side of ingested sstables, which then provide a gap in sstables boundaries for future ingested sstables as a part of the same sequential import.
Once keys reach L6 their sequence numbers are zeroed, so this would have little effect in L6 other than encouraging splitting keys with snapshot-preserved sequence numbers from non-snapshot-preserved sequence numbers.
The text was updated successfully, but these errors were encountered: