-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
39724: bulk: Compute MVCCStats of the SST being ingested on-the-fly r=adityamaru27 a=adityamaru27 This change is an optimization to the MVCCStats collection in the bulk ingestion pipeline. Currently when ingesting an SST via the SSTBatcher, we have one iteration to construct an SST, and an additional one to compute the MVCCStats for the span being ingested. In scenarios such as IMPORT, where we have an enforced guarantee (via the disallowShadowing flag) that the KVs being ingested do not shadow existing data, MVCCStats collection becomes very simple. This change adds logic to collect these stats on-the-fly while the SST is being constructed, thereby saving us an additional iteration which has been profiled as a bottleneck in IMPORT. TODO: There is a significant performance win to be achieved by ensuring that the stats computed are not estimates as it prevents recompuation on splits. Running AddSSTable with disallowShadowing=true gets us close to this as we do not allow colliding keys to be ingested. However, in the situation that two SSTs have KV(s) which "perfectly" shadow an existing key (equal ts and value), we do not consider this a collision. While the KV would just overwrite the existing data, the stats would be re-added, causing a double count for such KVs. One solution is to compute the stats for these "skipped" KVs on-the-fly while checking for the collision condition and returning their stats. The final stats would then be base_stats + sst_stats - skipped_stats, and this would be accurate. Benchmark update: Over three runs of TPCC 1k on a 4 node, default roachprod cluster, the time dropped from ~32m to ~22m. Release note: None Co-authored-by: Aditya Maru <adityamaru@cockroachlabs.com>
- Loading branch information
Showing
9 changed files
with
155 additions
and
60 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.