-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload/tpcc: improve indexing to permit better partitioning #36854
workload/tpcc: improve indexing to permit better partitioning #36854
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @danhhz, @jordanlewis, and @nvanbenschoten)
pkg/workload/tpcc/ddls.go, line 88 at r6 (raw file):
// HISTORY table. tpccHistorySchemaBase = `( rowid uuid not null default gen_random_uuid(),
While you're going to generate fixtures, I'd love to reconfirm that we think this field should be a UUID. I dug up the PR (#23827) that went from implicit PK to UUID and the reasoning for UUID over unique_rowid was that 1) unique_rowid apparently generated a duplicate (!) and 2) our production recommendation is UUID. I think we could fix (1), (2) is more compelling to me.
I ask because UUIDs make this table the only one in our tpcc Generator that doesn't generate rows in PKs order, which makes our lives harder for the upcoming direct-ingest IMPORT. It's something we'll have to solve for secondary indexes, but "solve" here means something like backpressure and trying to be careful about compactions and it's definitely better if the generated data doesn't overlap in the first place. It may also be possible to switch the UUIDs in the initial data to be deterministically generated from batchIdx and the offset within the batch, but I haven't looked into this yet.
Does anyone have strong opinions here?
pkg/workload/tpcc/tpcc.go, line 390 at r6 (raw file):
Splits: splits(workload.BatchedTuples{ NumBatches: numBatches(w.warehouses, numWarehousesPerRange), NumTotal: w.warehouses,
all of these NumTotals on the splits look wrong to me. They're supposed to represent how many total rows are produced by all the batches, but each batch seems to be of size 1. I'm actually planning on getting rid of them entirely, they haven't been useful for anything, so I'd omit them here
This commit improves the primary key of the history table, allowing it to be partitioned based on warehouse ID. Release note: None
It's unclear why this index was ever added. It wasn't being used for anything. Release note: None
There was no reason to do this and it made --wait=false incompatible with partitioning. Release note: None
21e1cdd
to
df6a341
Compare
This change migrates workload/tpcc from adding foreign key relations in a PostLoad step to adding them in the schema itself. While doing so, it also ensures that indexes that are only used for foreign keys are made clear and are not created when fks are not in use. This will require fixture regeneration. Release note: None
Closes cockroachdb#35891. See page 15 of the spec: http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf Release note: None
Release note: None
df6a341
to
92c8992
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easiest way to address Dan's comment: wait until he does himself in other PRs 😃 TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @danhhz and @jordanlewis)
pkg/workload/tpcc/ddls.go, line 88 at r6 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
While you're going to generate fixtures, I'd love to reconfirm that we think this field should be a UUID. I dug up the PR (#23827) that went from implicit PK to UUID and the reasoning for UUID over unique_rowid was that 1) unique_rowid apparently generated a duplicate (!) and 2) our production recommendation is UUID. I think we could fix (1), (2) is more compelling to me.
I ask because UUIDs make this table the only one in our tpcc Generator that doesn't generate rows in PKs order, which makes our lives harder for the upcoming direct-ingest IMPORT. It's something we'll have to solve for secondary indexes, but "solve" here means something like backpressure and trying to be careful about compactions and it's definitely better if the generated data doesn't overlap in the first place. It may also be possible to switch the UUIDs in the initial data to be deterministically generated from batchIdx and the offset within the batch, but I haven't looked into this yet.
Does anyone have strong opinions here?
You're way ahead of me: #37515.
pkg/workload/tpcc/tpcc.go, line 390 at r6 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
all of these NumTotals on the splits look wrong to me. They're supposed to represent how many total rows are produced by all the batches, but each batch seems to be of size 1. I'm actually planning on getting rid of them entirely, they haven't been useful for anything, so I'd omit them here
Looks like you got rid of them all.
36854: workload/tpcc: improve indexing to permit better partitioning r=nvanbenschoten a=nvanbenschoten This PR contains a number of incremental improvements to TPC-C that allow it to be more effectively partitioned. The changes focus on migrating indexes to all be partitionable by warehouse and removing indexes that are only needed for foreign keys when fks are not enabled. This is all a precursor to a follow-up PR: #36855. I'm assigning @jordanlewis and @danhhz as reviewers because between the two of you most of the changes here have already been agreed upon. This PR will require me to regenerate all TPC-C fixtures. I intend to do so before it is merged. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Build succeeded |
This PR contains a number of incremental improvements to TPC-C that allow it to be more effectively partitioned. The changes focus on migrating indexes to all be partitionable by warehouse and removing indexes that are only needed for foreign keys when fks are not enabled. This is all a precursor to a follow-up PR: #36855.
I'm assigning @jordanlewis and @danhhz as reviewers because between the two of you most of the changes here have already been agreed upon.
This PR will require me to regenerate all TPC-C fixtures. I intend to do so before it is merged.