workload/tpcc: improve indexing to permit better partitioning #36854

nvanbenschoten · 2019-04-15T23:07:00Z

This PR contains a number of incremental improvements to TPC-C that allow it to be more effectively partitioned. The changes focus on migrating indexes to all be partitionable by warehouse and removing indexes that are only needed for foreign keys when fks are not enabled. This is all a precursor to a follow-up PR: #36855.

I'm assigning @jordanlewis and @danhhz as reviewers because between the two of you most of the changes here have already been agreed upon.

This PR will require me to regenerate all TPC-C fixtures. I intend to do so before it is merged.

cockroach-teamcity · 2019-04-15T23:07:12Z

This change is

danhhz

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @danhhz, @jordanlewis, and @nvanbenschoten)

pkg/workload/tpcc/ddls.go, line 88 at r6 (raw file):

	// HISTORY table.
	tpccHistorySchemaBase = `(
		rowid    uuid    not null default gen_random_uuid(),

While you're going to generate fixtures, I'd love to reconfirm that we think this field should be a UUID. I dug up the PR (#23827) that went from implicit PK to UUID and the reasoning for UUID over unique_rowid was that 1) unique_rowid apparently generated a duplicate (!) and 2) our production recommendation is UUID. I think we could fix (1), (2) is more compelling to me.

I ask because UUIDs make this table the only one in our tpcc Generator that doesn't generate rows in PKs order, which makes our lives harder for the upcoming direct-ingest IMPORT. It's something we'll have to solve for secondary indexes, but "solve" here means something like backpressure and trying to be careful about compactions and it's definitely better if the generated data doesn't overlap in the first place. It may also be possible to switch the UUIDs in the initial data to be deterministically generated from batchIdx and the offset within the batch, but I haven't looked into this yet.

Does anyone have strong opinions here?

pkg/workload/tpcc/tpcc.go, line 390 at r6 (raw file):

		Splits: splits(workload.BatchedTuples{
			NumBatches: numBatches(w.warehouses, numWarehousesPerRange),
			NumTotal:   w.warehouses,

all of these NumTotals on the splits look wrong to me. They're supposed to represent how many total rows are produced by all the batches, but each batch seems to be of size 1. I'm actually planning on getting rid of them entirely, they haven't been useful for anything, so I'd omit them here

This commit improves the primary key of the history table, allowing it to be partitioned based on warehouse ID. Release note: None

It's unclear why this index was ever added. It wasn't being used for anything. Release note: None

There was no reason to do this and it made --wait=false incompatible with partitioning. Release note: None

This change migrates workload/tpcc from adding foreign key relations in a PostLoad step to adding them in the schema itself. While doing so, it also ensures that indexes that are only used for foreign keys are made clear and are not created when fks are not in use. This will require fixture regeneration. Release note: None

Closes cockroachdb#35891. See page 15 of the spec: http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf Release note: None

Release note: None

nvanbenschoten

Easiest way to address Dan's comment: wait until he does himself in other PRs 😃 TFTR!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @danhhz and @jordanlewis)

pkg/workload/tpcc/ddls.go, line 88 at r6 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

While you're going to generate fixtures, I'd love to reconfirm that we think this field should be a UUID. I dug up the PR (#23827) that went from implicit PK to UUID and the reasoning for UUID over unique_rowid was that 1) unique_rowid apparently generated a duplicate (!) and 2) our production recommendation is UUID. I think we could fix (1), (2) is more compelling to me.

I ask because UUIDs make this table the only one in our tpcc Generator that doesn't generate rows in PKs order, which makes our lives harder for the upcoming direct-ingest IMPORT. It's something we'll have to solve for secondary indexes, but "solve" here means something like backpressure and trying to be careful about compactions and it's definitely better if the generated data doesn't overlap in the first place. It may also be possible to switch the UUIDs in the initial data to be deterministically generated from batchIdx and the offset within the batch, but I haven't looked into this yet.

Does anyone have strong opinions here?

You're way ahead of me: #37515.

pkg/workload/tpcc/tpcc.go, line 390 at r6 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

all of these NumTotals on the splits look wrong to me. They're supposed to represent how many total rows are produced by all the batches, but each batch seems to be of size 1. I'm actually planning on getting rid of them entirely, they haven't been useful for anything, so I'd omit them here

Looks like you got rid of them all.

@jordanlewis

36854: workload/tpcc: improve indexing to permit better partitioning r=nvanbenschoten a=nvanbenschoten This PR contains a number of incremental improvements to TPC-C that allow it to be more effectively partitioned. The changes focus on migrating indexes to all be partitionable by warehouse and removing indexes that are only needed for foreign keys when fks are not enabled. This is all a precursor to a follow-up PR: #36855. I'm assigning @jordanlewis and @danhhz as reviewers because between the two of you most of the changes here have already been agreed upon. This PR will require me to regenerate all TPC-C fixtures. I intend to do so before it is merged. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

craig · 2019-05-20T20:59:00Z

Build succeeded

GitHub CI (Cockroach)

nvanbenschoten requested review from jordanlewis and danhhz April 15, 2019 23:07

nvanbenschoten requested a review from a team as a code owner April 15, 2019 23:07

nvanbenschoten mentioned this pull request Apr 15, 2019

workload/tpcc: support replicated indexes and lease partitioning #36855

Merged

danhhz reviewed Apr 16, 2019

View reviewed changes

This was referenced May 2, 2019

workload/tpcc: add missing new_order -> order foreign key #35891

Closed

workload: add an option to remove foreign keys from tpcc #37345

Closed

nvanbenschoten mentioned this pull request May 20, 2019

workload/tpcc: duplicate FK detection during initialization broken #37590

Closed

nvanbenschoten added 3 commits May 20, 2019 16:04

workload/tpcc: give history a better primary key

2dfe5c4

This commit improves the primary key of the history table, allowing it to be partitioned based on warehouse ID. Release note: None

workload/tpcc: remove useless index on "order" table

7434690

It's unclear why this index was ever added. It wasn't being used for anything. Release note: None

workload/tpcc: don't pick random warehouse when --wait=false

291e437

There was no reason to do this and it made --wait=false incompatible with partitioning. Release note: None

nvanbenschoten force-pushed the nvanbenschoten/geoTpcc branch from 21e1cdd to df6a341 Compare May 20, 2019 20:08

nvanbenschoten added 3 commits May 20, 2019 16:38

workload/tpcc: add missing new_order -> order foreign key

230dae9

Closes cockroachdb#35891. See page 15 of the spec: http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf Release note: None

workload/tpcc: tolerate missing indexes when partitioning

92c8992

Release note: None

nvanbenschoten force-pushed the nvanbenschoten/geoTpcc branch from df6a341 to 92c8992 Compare May 20, 2019 20:38

nvanbenschoten commented May 20, 2019

View reviewed changes

craig bot merged commit 92c8992 into cockroachdb:master May 20, 2019

nvanbenschoten deleted the nvanbenschoten/geoTpcc branch May 20, 2019 21:02

nvanbenschoten mentioned this pull request May 20, 2019

sql: permit foreign key relationships without indexes on the source side #36859

Closed

yuzefovich mentioned this pull request May 21, 2019

workload: add flag to remove FKs and Indexes from tpc-c #37265

Closed

15 tasks

nvanbenschoten mentioned this pull request May 21, 2019

roachtest: scrub/index-only/tpcc/w=100 failed #37551

Closed

knz mentioned this pull request Nov 10, 2019

User-facing changes in 19.2 that were not picked up in release notes cockroachdb/docs#5819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workload/tpcc: improve indexing to permit better partitioning #36854

workload/tpcc: improve indexing to permit better partitioning #36854

nvanbenschoten commented Apr 15, 2019 •

edited

Loading

cockroach-teamcity commented Apr 15, 2019

danhhz left a comment

nvanbenschoten left a comment

craig bot commented May 20, 2019

workload/tpcc: improve indexing to permit better partitioning #36854

workload/tpcc: improve indexing to permit better partitioning #36854

Conversation

nvanbenschoten commented Apr 15, 2019 • edited Loading

cockroach-teamcity commented Apr 15, 2019

danhhz left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

craig bot commented May 20, 2019

Build succeeded

nvanbenschoten commented Apr 15, 2019 •

edited

Loading