Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: repartition regional by row tables on region add #60596

Merged
merged 1 commit into from
Feb 22, 2021

Conversation

arulajmani
Copy link
Collaborator

@arulajmani arulajmani commented Feb 15, 2021

his patch adds functionality to repartition REGIONAL BY ROW tables when
a new region is added to the database. An index can only be partitioned
using an enum value if it is PUBLIC. Thus we only modify partition
descriptors once all transitioning enum members have done so
successfully. This happens in the same transaction that promoted the
enum members, ensuring sane rollbacks in the face of non-retryable
partitioning failure. Once partitioning is complete, zone
configurations are applied to partitions as well.

This patch changes how uncommitted type descriptors are stored to enable
partitioning in the same transaction that previously promoted enum
members. A column's types.T is hydrated using enum metadata fields that
are cached on the type descriptor. Previously, this cache was
constructed based on the cluster version of the type descriptor and not
updated when the type descriptor was modified. After this patch,
whenever a type descriptor is added back as an uncomitted descriptor to
the desc collection, we reconstruct the cached fields stored on it.
For us, this means that tables being partitioned are hydrated with the
correct visibility for enum members, which allows us to partition on
values promoted in the same transaction.

Release note (sql change): ALTER DATABASE ... ADD REGION now
repartitions REGIONAL BY ROW tables and updates the zone configs on
the newly created partitions as well.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@arulajmani
Copy link
Collaborator Author

The one thing that this PR doesn't handle is what happens if the transaction performing the repartitioning encounters a non-retryable error. By that point, because the enum member promotion/demotion has been finalized, there isn't a way to roll back changes. I'm not sure if there is any way around this right now, though I wonder if it makes sense to repartition individual tables in individual transactions. Would love some thoughts on this stuff and any other tricky scenarios that I may be missing here.

Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, @arulajmani, and @lucy-zhang)


pkg/ccl/logictestccl/testdata/logic_test/alter_table_locality, line 1815 at r1 (raw file):


statement ok
CREATE DATABASE add_regions WITH PRIMARY REGION "ca-central-1";

let's put these in regional_by_row


pkg/ccl/logictestccl/testdata/logic_test/alter_table_locality, line 1946 at r1 (raw file):

                 crdb_region public.crdb_internal_region NOT VISIBLE NOT NULL DEFAULT default_to_database_primary_region(gateway_region())::public.crdb_internal_region,
                 CONSTRAINT "primary" PRIMARY KEY (crdb_region ASC, pk ASC),
                 INDEX regional_by_row_i_idx (crdb_region ASC, i ASC),

this now has crdb_region in front, so it is no longer implicitly partitioned, which is incorrect. see comment about CreatePartitioning below


pkg/sql/type_change.go, line 308 at r1 (raw file):

		// as enum members must be public before we can partition based on those
		// values, which happens when the transaction above commits.
		if typeDesc.Kind == descpb.TypeDescriptor_MULTIREGION_ENUM && len(t.transitioningMembers) > 0 {

does this happen on a separate transaction to the enum finalisation? if so, how does rollback / failing txn work?


pkg/sql/type_change.go, line 354 at r1 (raw file):

) error {
	p, cleanup := NewInternalPlanner(
		"repartition-tables",

nit: repartition-regional-by-row-table


pkg/sql/type_change.go, line 393 at r1 (raw file):

			// dropped.
			for _, index := range tableDesc.NonDropIndexes() {
				newIdx, err := CreatePartitioning(

CreatePartitioning on an already implicitly partitioned index will un-implicitly partition the index.
You'll need to add some logic to truncate the implicit column partitioning before redoing it, i.e.

indexDesc.ColumnIDs = indexDesc.ColumnIDs[indexDesc.PartitioningNumImplicitColumns:]
indexDesc.ColumnNames = indexDesc.ColumnNames[indexDesc.PartitioningNumImplicitColumns:]
indexDesc.ColumnDirections = indexDesc.ColumnDirections[indexDesc.PartitioningNumImplicitColumns:]
indexDesc.Partitioning = descpb.PartitioningDesc()

you'll probably want to make this a method.


pkg/sql/catalog/tabledesc/structured.go, line 4029 at r1 (raw file):

func (desc *wrapper) GetRegionalByRowTableRegionColumnName() (tree.Name, error) {
	if !desc.IsLocalityRegionalByRow() {
		return "", errors.New("is not REGIONAL BY ROW")

nit: put the column name in here

@otan
Copy link
Contributor

otan commented Feb 16, 2021


pkg/sql/catalog/tabledesc/structured.go, line 4029 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

nit: put the column name in here

i do this in #60600

@ajstorm ajstorm requested a review from otan February 16, 2021 15:06
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, @lucy-zhang, and @otan)


pkg/ccl/logictestccl/testdata/logic_test/alter_table_locality, line 1815 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

let's put these in regional_by_row

I'm confused that these even work here. Don't we require CCL to allow REGIONAL BY ROW tables to be created?


pkg/sql/type_change.go, line 308 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

does this happen on a separate transaction to the enum finalisation? if so, how does rollback / failing txn work?

Is there no way to make this work in the same transaction? Do we support other operations like this in the same transaction (add column and partition by that column?)


pkg/sql/type_change.go, line 346 at r1 (raw file):

}

func repartitionRegionalByRowTables(

This function has a fair bit going on in it. I'd think it would warrant a comment above.


pkg/sql/type_change.go, line 379 at r1 (raw file):

			return err
		}
		if tableDesc.IsLocalityRegionalByRow() {

Nit: We can save one level of indentation if we continue here in cases where we're !RBR. In fact, you could combine this check with the check below to further simplify the code:

if !tableDesc.IsLocalityRegionalByRow()  || tableDesc.Dropped() {
   continue
}

pkg/sql/type_change.go, line 393 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

CreatePartitioning on an already implicitly partitioned index will un-implicitly partition the index.
You'll need to add some logic to truncate the implicit column partitioning before redoing it, i.e.

indexDesc.ColumnIDs = indexDesc.ColumnIDs[indexDesc.PartitioningNumImplicitColumns:]
indexDesc.ColumnNames = indexDesc.ColumnNames[indexDesc.PartitioningNumImplicitColumns:]
indexDesc.ColumnDirections = indexDesc.ColumnDirections[indexDesc.PartitioningNumImplicitColumns:]
indexDesc.Partitioning = descpb.PartitioningDesc()

you'll probably want to make this a method.

Does this speak to the need to push this code directly into CreatePartitioning (or as an optional flag)? Seems error prone the way it currently exists.

@thoszhang thoszhang removed their request for review February 16, 2021 17:47
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, @lucy-zhang, and @otan)


pkg/ccl/logictestccl/testdata/logic_test/alter_table_locality, line 1815 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

I'm confused that these even work here. Don't we require CCL to allow REGIONAL BY ROW tables to be created?

I think regional_by_row is in ccl -- I forgot that existed, I'll move these.


pkg/sql/type_change.go, line 308 at r1 (raw file):

add column and partition by that column

I don't think this is possible unless the column is then used to create an index. The index, once added, isn't visible to the transaction that added it because it isn't public yet. That being said, Im not sure if there's an analogy between what prevents us from using the same txn here (non public enum member) to the scenario I described above.

As for can we allow partitioning if the enum member is in read only mode? I don't think we can in the general case because members in read only mode may be removed instead of being promoted to public (either because of a rollback or because they're being removed).

I could use some help around this issue from the Schema team -- cc @ajwerner @lucy-zhang


pkg/sql/type_change.go, line 393 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Does this speak to the need to push this code directly into CreatePartitioning (or as an optional flag)? Seems error prone the way it currently exists.

I agree w you Adam, I'll push out something to that effect soon.


pkg/sql/catalog/tabledesc/structured.go, line 4029 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

i do this in #60600

Assuming you meant table name here? Done.

@arulajmani arulajmani requested a review from a team as a code owner February 18, 2021 14:56
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still missing a test to ensure transactional rollbacks, but before I can add that, I need to fix the fact that currently the type schema changer doesn't handle region add failing too well (it basically leaves the database descriptor and type descriptor out of sync, causing validation failure). This should be fixed once we stop duplicating regions, but until then I'm inclined to merge a bandaid to enable testing. I'll do so in a separate patch, and then rebase this thing.

The code change, however, should be ready for another look!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, @arulajmani, @lucy-zhang, and @otan)


pkg/ccl/logictestccl/testdata/logic_test/alter_table_locality, line 1815 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I think regional_by_row is in ccl -- I forgot that existed, I'll move these.

Done.


pkg/sql/type_change.go, line 346 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

This function has a fair bit going on in it. I'd think it would warrant a comment above.

Agreed, done.


pkg/sql/type_change.go, line 393 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I agree w you Adam, I'll push out something to that effect soon.

No longer applicable, as @otan already fixed this!

Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 12 of 16 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, @arulajmani, @lucy-zhang, and @otan)


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row, line 1272 at r2 (raw file):


query TT
SHOW CREATE TABLE regional_by_row_as

Nit: Not required for this PR, but I'm wondering if we could make these tests more robust by creating a new table with the same columns after the regions have been added, and then diffing their SHOW CREATE TABLE statements (and zone configs, if we move them out of SHOW CREATE TABLE) with the tables that were altered through the ADD REGION process?


pkg/sql/type_change.go, line 331 at r2 (raw file):

				if errors.Is(err, catalog.ErrDescriptorNotFound) {
					// Swallow.
					log.Infof(ctx, "assuming repartitioned table %d was dropped and moving on", tbID)

Nit: Can we be a bit more descriptive in this log record? Something like "in the process of changing types, table being repartitioned can not be found - assuming repartitioned table..."?


pkg/sql/type_change.go, line 337 at r2 (raw file):

		}

		// Finally, make sure all of the leases are updated.

Nit: To contrast with the previous comment, maybe "...all of the type leases are update"?


pkg/sql/type_change.go, line 359 at r2 (raw file):

}

// repartitionRegionalByRowTables takes multi-region enum and re-partitions all

Nit: "takes a multi-region enum..."


pkg/sql/type_change.go, line 362 at r2 (raw file):

// REGIONAL BY ROW tables in the enclosing database such that there is a
// partition and corresponding zone configuration for all PUBLIC enum members
// (regions). Th

Nit: "Th"? Don't leave me hanging dude.

Can we also describe the cases where we may find non-PUBLIC enum members here?


pkg/sql/type_change.go, line 392 at r2 (raw file):

		ctx, txn, typeDesc.ParentID, tree.DatabaseLookupFlags{Required: true})
	if err != nil {
		return repartitionedTableIDs, err

Nit: this code might read better if we return a nil slice in the error cases. When I glanced at this first I thought: "wait, why are we returning the slice here?" and then I realized that it's still nil.


pkg/sql/catalog/tabledesc/structured.go, line 4029 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Assuming you meant table name here? Done.

Why the change from an assertion? Are there legitimate cases where we can get in here?

@ajstorm ajstorm self-requested a review February 19, 2021 15:51
@thoszhang thoszhang removed their request for review February 19, 2021 16:34
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the test I promised, RFAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, @arulajmani, @lucy-zhang, and @otan)


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row, line 1272 at r2 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: Not required for this PR, but I'm wondering if we could make these tests more robust by creating a new table with the same columns after the regions have been added, and then diffing their SHOW CREATE TABLE statements (and zone configs, if we move them out of SHOW CREATE TABLE) with the tables that were altered through the ADD REGION process?

Sounds like a good idea. Maybe instead of comparing the SHOW CREATE TABLE statements, we can instead compare the two descriptors? (In a followup, ofcourse)


pkg/sql/type_change.go, line 362 at r2 (raw file):

Nit: "Th"? Don't leave me hanging dude.

Oops removed.

Can we also describe the cases where we may find non-PUBLIC enum members here?

I don't think this works as expected until #60620 is resolved. One way this scenario can happen is if 2 concurrent transactions try to add a region. In that case, atleast one but maybe both of those transactions could fail. This is because we're constructing the partitioning clause based on the regions on the database descriptor, but the first job that tries to repartition will not be able to partition based on the READ ONLY member the second job is responsible for promoting.


pkg/sql/catalog/tabledesc/structured.go, line 4029 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Why the change from an assertion? Are there legitimate cases where we can get in here?

Good catch, rebase mishap

@arulajmani arulajmani force-pushed the partition-when-adding-regions branch 2 times, most recently from 3e5d9df to f791fc8 Compare February 19, 2021 21:42
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but I didn't look too closely at the schema changes.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, @arulajmani, @lucy-zhang, and @otan)


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row, line 1272 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Sounds like a good idea. Maybe instead of comparing the SHOW CREATE TABLE statements, we can instead compare the two descriptors? (In a followup, ofcourse)

Sure, that works.


pkg/sql/type_change.go, line 362 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Nit: "Th"? Don't leave me hanging dude.

Oops removed.

Can we also describe the cases where we may find non-PUBLIC enum members here?

I don't think this works as expected until #60620 is resolved. One way this scenario can happen is if 2 concurrent transactions try to add a region. In that case, atleast one but maybe both of those transactions could fail. This is because we're constructing the partitioning clause based on the regions on the database descriptor, but the first job that tries to repartition will not be able to partition based on the READ ONLY member the second job is responsible for promoting.

New comment, describing the issue looks good.

@ajstorm ajstorm self-requested a review February 19, 2021 21:48
@arulajmani
Copy link
Collaborator Author

@ajwerner I know we talked about this approach offline the other day, but the changes in the desc Collection could use a look from you (or @postamar) as folks most familiar with that code.

Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 16 files at r2, 6 of 8 files at r3, 4 of 4 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, @lucy-zhang, @otan, and @postamar)


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row, line 1210 at r4 (raw file):


query TT
SHOW CREATE TABLE regional_by_row

to futureproof this, can you add SELECT target, raw_config_sql FROM crdb_internal.zones WHERE table_name = 't_regional_by_row' ORDER BY target for each of these SHOW CREATE TABLEs?


pkg/ccl/multiregionccl/regional_by_row_test.go, line 372 at r4 (raw file):

	regionNames := make([]string, numServers)
	for i := 0; i < numServers; i++ {
		// "us-east1", "us-east2"...

do you mind extracting this cluster setup to a method? i have a feels we'll use this often.


pkg/sql/planner.go, line 248 at r4 (raw file):

	sessionData sessiondatapb.SessionData,
	collection *descs.Collection,
) (interface{}, func()) {

can you use the options pattern for this (as per our style guide), e.g. https://www.sohamkamani.com/golang/options-pattern/

@blathers-crl blathers-crl bot requested a review from otan February 22, 2021 00:55
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajstorm, @ajwerner, @arulajmani, @lucy-zhang, @otan, and @postamar)


pkg/ccl/logictestccl/testdata/logic_test/alter_table_locality, line 1946 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

this now has crdb_region in front, so it is no longer implicitly partitioned, which is incorrect. see comment about CreatePartitioning below

Talked offline, adding SHOW ZONE CONFIGURATIONS (which will be re-written later) and a query on SHOW PARTITIONS.


pkg/ccl/multiregionccl/regional_by_row_test.go, line 372 at r4 (raw file):

Previously, otan (Oliver Tan) wrote…

do you mind extracting this cluster setup to a method? i have a feels we'll use this often.

I agree, done.


pkg/sql/planner.go, line 248 at r4 (raw file):

Previously, otan (Oliver Tan) wrote…

can you use the options pattern for this (as per our style guide), e.g. https://www.sohamkamani.com/golang/options-pattern/

Done, thanks for the suggestion!

Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 12 of 12 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, @lucy-zhang, @otan, and @postamar)


pkg/sql/planner.go, line 249 at r5 (raw file):

// WithDescCollection configures the planner with the provided collection
// instead of the default (creating a new one from scratch).
func WithDescCollection(collection *descs.Collection) InternalPlannerParamsOption {

i usually like prefixing the option like so: InternalPlannerParamsOption(With)DescCollection so it's searchable easily with just the prefix (but it can be a bit long)


pkg/sql/planner.go, line 257 at r5 (raw file):

// NewInternalPlanner is an exported version of newInternalPlanner. It
// returns an interface{} so it can be used outside of the sql package.
// NewInternalPlanner initializes a new desc.Collection if an ovveride option

i would nix the new comment, or at the very least, fix the ovveride typo :P


pkg/sql/planner.go, line 298 at r5 (raw file):

	execCfg *ExecutorConfig,
	sessionData sessiondatapb.SessionData,
	params *InternalPlannerParams,

can we thread the options into here to avoid this new struct?


pkg/sql/values_test.go, line 59 at r5 (raw file):

		sessiondatapb.SessionData{},
	)
	return p.(*planner)

nit: revert


pkg/sql/schemachanger/scbuild/builder_test.go, line 148 at r5 (raw file):

		// way doesn't seem to do what we want.
		sessiondatapb.SessionData{},
		nil, /* collection */

will this fail any tests? is the comment still right?

This patch adds functionality to repartition REGIONAL BY ROW tables when
a new region is added to the database. An index can only be partitioned
using an enum value if it is PUBLIC. Thus we only modify partition
descriptors once all transitioning enum members have done so
successfully. This happens in the same transaction that promoted the
enum members, ensuring sane rollbacks in the face of non-retryable
partitioning failure. Once partitioning is complete, zone
configurations are applied to partitions as well.

This patch changes how uncommitted type descriptors are stored to enable
partitioning in the same transaction that previously promoted enum
members. A column's types.T is hydrated using enum metadata fields that
are cached on the type descriptor. Previously, this cache was
constructed based on the cluster version of the type descriptor and not
updated when the type descriptor was modified. After this patch,
whenever a type descriptor is added back as an uncomitted descriptor to
the desc collection, we reconstruct the cached fields stored on it.
For us, this means that tables being partitioned are hydrated with the
correct visibility for enum members, which allows us to partition on
values promoted in the same transaction.

Release note (sql change): ALTER DATABASE ... ADD REGION now
repartitions REGIONAL BY ROW tables and updates the zone configs on
the newly created partitions as well.
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, @lucy-zhang, @otan, and @postamar)


pkg/sql/planner.go, line 249 at r5 (raw file):

Previously, otan (Oliver Tan) wrote…

i usually like prefixing the option like so: InternalPlannerParamsOption(With)DescCollection so it's searchable easily with just the prefix (but it can be a bit long)

IMO the readability of what I have rn outweighs the searchable aspect, unless you really insist.


pkg/sql/planner.go, line 257 at r5 (raw file):

Previously, otan (Oliver Tan) wrote…

i would nix the new comment, or at the very least, fix the ovveride typo :P

Done.


pkg/sql/planner.go, line 298 at r5 (raw file):

Previously, otan (Oliver Tan) wrote…

can we thread the options into here to avoid this new struct?

Talked offline, threaded the options but made the new struct internal as opposed to getting rid of it (in the hope that people might extend it in the future).


pkg/sql/values_test.go, line 59 at r5 (raw file):

Previously, otan (Oliver Tan) wrote…

nit: revert

I need this because I changed the call from newInternalPlanner(which returns a planner) to NewInternalPlanner (which returns an interface). Don't really need the switch now that Ive passed the options down, but now that I've cleaned up all accesses to newInternalPlanner to go through the public method, Im inclined to just leave it as is.


pkg/sql/schemachanger/scbuild/builder_test.go, line 148 at r5 (raw file):

Previously, otan (Oliver Tan) wrote…

will this fail any tests? is the comment still right?

Detritus, fixed.

@arulajmani
Copy link
Collaborator Author

TFTRs!

bors r=otan,ajstorm

@craig
Copy link
Contributor

craig bot commented Feb 22, 2021

Build failed:

@arulajmani
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 22, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants