-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: repartition regional by row tables on region add #60596
sql: repartition regional by row tables on region add #60596
Conversation
The one thing that this PR doesn't handle is what happens if the transaction performing the repartitioning encounters a non-retryable error. By that point, because the enum member promotion/demotion has been finalized, there isn't a way to roll back changes. I'm not sure if there is any way around this right now, though I wonder if it makes sense to repartition individual tables in individual transactions. Would love some thoughts on this stuff and any other tricky scenarios that I may be missing here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, @arulajmani, and @lucy-zhang)
pkg/ccl/logictestccl/testdata/logic_test/alter_table_locality, line 1815 at r1 (raw file):
statement ok CREATE DATABASE add_regions WITH PRIMARY REGION "ca-central-1";
let's put these in regional_by_row
pkg/ccl/logictestccl/testdata/logic_test/alter_table_locality, line 1946 at r1 (raw file):
crdb_region public.crdb_internal_region NOT VISIBLE NOT NULL DEFAULT default_to_database_primary_region(gateway_region())::public.crdb_internal_region, CONSTRAINT "primary" PRIMARY KEY (crdb_region ASC, pk ASC), INDEX regional_by_row_i_idx (crdb_region ASC, i ASC),
this now has crdb_region in front, so it is no longer implicitly partitioned, which is incorrect. see comment about CreatePartitioning below
pkg/sql/type_change.go, line 308 at r1 (raw file):
// as enum members must be public before we can partition based on those // values, which happens when the transaction above commits. if typeDesc.Kind == descpb.TypeDescriptor_MULTIREGION_ENUM && len(t.transitioningMembers) > 0 {
does this happen on a separate transaction to the enum finalisation? if so, how does rollback / failing txn work?
pkg/sql/type_change.go, line 354 at r1 (raw file):
) error { p, cleanup := NewInternalPlanner( "repartition-tables",
nit: repartition-regional-by-row-table
pkg/sql/type_change.go, line 393 at r1 (raw file):
// dropped. for _, index := range tableDesc.NonDropIndexes() { newIdx, err := CreatePartitioning(
CreatePartitioning on an already implicitly partitioned index will un-implicitly partition the index.
You'll need to add some logic to truncate the implicit column partitioning before redoing it, i.e.
indexDesc.ColumnIDs = indexDesc.ColumnIDs[indexDesc.PartitioningNumImplicitColumns:]
indexDesc.ColumnNames = indexDesc.ColumnNames[indexDesc.PartitioningNumImplicitColumns:]
indexDesc.ColumnDirections = indexDesc.ColumnDirections[indexDesc.PartitioningNumImplicitColumns:]
indexDesc.Partitioning = descpb.PartitioningDesc()
you'll probably want to make this a method.
pkg/sql/catalog/tabledesc/structured.go, line 4029 at r1 (raw file):
func (desc *wrapper) GetRegionalByRowTableRegionColumnName() (tree.Name, error) { if !desc.IsLocalityRegionalByRow() { return "", errors.New("is not REGIONAL BY ROW")
nit: put the column name in here
pkg/sql/catalog/tabledesc/structured.go, line 4029 at r1 (raw file): Previously, otan (Oliver Tan) wrote…
i do this in #60600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, @lucy-zhang, and @otan)
pkg/ccl/logictestccl/testdata/logic_test/alter_table_locality, line 1815 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
let's put these in
regional_by_row
I'm confused that these even work here. Don't we require CCL to allow REGIONAL BY ROW tables to be created?
pkg/sql/type_change.go, line 308 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
does this happen on a separate transaction to the enum finalisation? if so, how does rollback / failing txn work?
Is there no way to make this work in the same transaction? Do we support other operations like this in the same transaction (add column and partition by that column?)
pkg/sql/type_change.go, line 346 at r1 (raw file):
} func repartitionRegionalByRowTables(
This function has a fair bit going on in it. I'd think it would warrant a comment above.
pkg/sql/type_change.go, line 379 at r1 (raw file):
return err } if tableDesc.IsLocalityRegionalByRow() {
Nit: We can save one level of indentation if we continue
here in cases where we're !RBR. In fact, you could combine this check with the check below to further simplify the code:
if !tableDesc.IsLocalityRegionalByRow() || tableDesc.Dropped() {
continue
}
pkg/sql/type_change.go, line 393 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
CreatePartitioning on an already implicitly partitioned index will un-implicitly partition the index.
You'll need to add some logic to truncate the implicit column partitioning before redoing it, i.e.indexDesc.ColumnIDs = indexDesc.ColumnIDs[indexDesc.PartitioningNumImplicitColumns:] indexDesc.ColumnNames = indexDesc.ColumnNames[indexDesc.PartitioningNumImplicitColumns:] indexDesc.ColumnDirections = indexDesc.ColumnDirections[indexDesc.PartitioningNumImplicitColumns:] indexDesc.Partitioning = descpb.PartitioningDesc()
you'll probably want to make this a method.
Does this speak to the need to push this code directly into CreatePartitioning
(or as an optional flag)? Seems error prone the way it currently exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, @lucy-zhang, and @otan)
pkg/ccl/logictestccl/testdata/logic_test/alter_table_locality, line 1815 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
I'm confused that these even work here. Don't we require CCL to allow REGIONAL BY ROW tables to be created?
I think regional_by_row
is in ccl -- I forgot that existed, I'll move these.
pkg/sql/type_change.go, line 308 at r1 (raw file):
add column and partition by that column
I don't think this is possible unless the column is then used to create an index. The index, once added, isn't visible to the transaction that added it because it isn't public yet. That being said, Im not sure if there's an analogy between what prevents us from using the same txn here (non public enum member) to the scenario I described above.
As for can we allow partitioning if the enum member is in read only mode? I don't think we can in the general case because members in read only mode may be removed instead of being promoted to public (either because of a rollback or because they're being removed).
I could use some help around this issue from the Schema team -- cc @ajwerner @lucy-zhang
pkg/sql/type_change.go, line 393 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Does this speak to the need to push this code directly into
CreatePartitioning
(or as an optional flag)? Seems error prone the way it currently exists.
I agree w you Adam, I'll push out something to that effect soon.
pkg/sql/catalog/tabledesc/structured.go, line 4029 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
i do this in #60600
Assuming you meant table name here? Done.
18952af
to
d9fb08e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still missing a test to ensure transactional rollbacks, but before I can add that, I need to fix the fact that currently the type schema changer doesn't handle region add failing too well (it basically leaves the database descriptor and type descriptor out of sync, causing validation failure). This should be fixed once we stop duplicating regions, but until then I'm inclined to merge a bandaid to enable testing. I'll do so in a separate patch, and then rebase this thing.
The code change, however, should be ready for another look!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, @arulajmani, @lucy-zhang, and @otan)
pkg/ccl/logictestccl/testdata/logic_test/alter_table_locality, line 1815 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
I think
regional_by_row
is in ccl -- I forgot that existed, I'll move these.
Done.
pkg/sql/type_change.go, line 346 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
This function has a fair bit going on in it. I'd think it would warrant a comment above.
Agreed, done.
pkg/sql/type_change.go, line 393 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
I agree w you Adam, I'll push out something to that effect soon.
No longer applicable, as @otan already fixed this!
d9fb08e
to
1f65599
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 16 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, @arulajmani, @lucy-zhang, and @otan)
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row, line 1272 at r2 (raw file):
query TT SHOW CREATE TABLE regional_by_row_as
Nit: Not required for this PR, but I'm wondering if we could make these tests more robust by creating a new table with the same columns after the regions have been added, and then diffing their SHOW CREATE TABLE statements (and zone configs, if we move them out of SHOW CREATE TABLE) with the tables that were altered through the ADD REGION process?
pkg/sql/type_change.go, line 331 at r2 (raw file):
if errors.Is(err, catalog.ErrDescriptorNotFound) { // Swallow. log.Infof(ctx, "assuming repartitioned table %d was dropped and moving on", tbID)
Nit: Can we be a bit more descriptive in this log record? Something like "in the process of changing types, table being repartitioned can not be found - assuming repartitioned table..."?
pkg/sql/type_change.go, line 337 at r2 (raw file):
} // Finally, make sure all of the leases are updated.
Nit: To contrast with the previous comment, maybe "...all of the type leases are update"?
pkg/sql/type_change.go, line 359 at r2 (raw file):
} // repartitionRegionalByRowTables takes multi-region enum and re-partitions all
Nit: "takes a multi-region enum..."
pkg/sql/type_change.go, line 362 at r2 (raw file):
// REGIONAL BY ROW tables in the enclosing database such that there is a // partition and corresponding zone configuration for all PUBLIC enum members // (regions). Th
Nit: "Th"? Don't leave me hanging dude.
Can we also describe the cases where we may find non-PUBLIC enum members here?
pkg/sql/type_change.go, line 392 at r2 (raw file):
ctx, txn, typeDesc.ParentID, tree.DatabaseLookupFlags{Required: true}) if err != nil { return repartitionedTableIDs, err
Nit: this code might read better if we return a nil
slice in the error cases. When I glanced at this first I thought: "wait, why are we returning the slice here?" and then I realized that it's still nil
.
pkg/sql/catalog/tabledesc/structured.go, line 4029 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Assuming you meant table name here? Done.
Why the change from an assertion? Are there legitimate cases where we can get in here?
1f65599
to
21e37e4
Compare
21e37e4
to
1706b70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the test I promised, RFAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, @arulajmani, @lucy-zhang, and @otan)
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row, line 1272 at r2 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Nit: Not required for this PR, but I'm wondering if we could make these tests more robust by creating a new table with the same columns after the regions have been added, and then diffing their SHOW CREATE TABLE statements (and zone configs, if we move them out of SHOW CREATE TABLE) with the tables that were altered through the ADD REGION process?
Sounds like a good idea. Maybe instead of comparing the SHOW CREATE TABLE statements, we can instead compare the two descriptors? (In a followup, ofcourse)
pkg/sql/type_change.go, line 362 at r2 (raw file):
Nit: "Th"? Don't leave me hanging dude.
Oops removed.
Can we also describe the cases where we may find non-PUBLIC enum members here?
I don't think this works as expected until #60620 is resolved. One way this scenario can happen is if 2 concurrent transactions try to add a region. In that case, atleast one but maybe both of those transactions could fail. This is because we're constructing the partitioning clause based on the regions on the database descriptor, but the first job that tries to repartition will not be able to partition based on the READ ONLY member the second job is responsible for promoting.
pkg/sql/catalog/tabledesc/structured.go, line 4029 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Why the change from an assertion? Are there legitimate cases where we can get in here?
Good catch, rebase mishap
3e5d9df
to
f791fc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I didn't look too closely at the schema changes.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, @arulajmani, @lucy-zhang, and @otan)
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row, line 1272 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Sounds like a good idea. Maybe instead of comparing the SHOW CREATE TABLE statements, we can instead compare the two descriptors? (In a followup, ofcourse)
Sure, that works.
pkg/sql/type_change.go, line 362 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Nit: "Th"? Don't leave me hanging dude.
Oops removed.
Can we also describe the cases where we may find non-PUBLIC enum members here?
I don't think this works as expected until #60620 is resolved. One way this scenario can happen is if 2 concurrent transactions try to add a region. In that case, atleast one but maybe both of those transactions could fail. This is because we're constructing the partitioning clause based on the regions on the database descriptor, but the first job that tries to repartition will not be able to partition based on the READ ONLY member the second job is responsible for promoting.
New comment, describing the issue looks good.
f791fc8
to
1bc9d91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 16 files at r2, 6 of 8 files at r3, 4 of 4 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, @lucy-zhang, @otan, and @postamar)
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row, line 1210 at r4 (raw file):
query TT SHOW CREATE TABLE regional_by_row
to futureproof this, can you add SELECT target, raw_config_sql FROM crdb_internal.zones WHERE table_name = 't_regional_by_row' ORDER BY target
for each of these SHOW CREATE TABLE
s?
pkg/ccl/multiregionccl/regional_by_row_test.go, line 372 at r4 (raw file):
regionNames := make([]string, numServers) for i := 0; i < numServers; i++ { // "us-east1", "us-east2"...
do you mind extracting this cluster setup to a method? i have a feels we'll use this often.
pkg/sql/planner.go, line 248 at r4 (raw file):
sessionData sessiondatapb.SessionData, collection *descs.Collection, ) (interface{}, func()) {
can you use the options pattern for this (as per our style guide), e.g. https://www.sohamkamani.com/golang/options-pattern/
1bc9d91
to
0f3767d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFAL
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajstorm, @ajwerner, @arulajmani, @lucy-zhang, @otan, and @postamar)
pkg/ccl/logictestccl/testdata/logic_test/alter_table_locality, line 1946 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
this now has crdb_region in front, so it is no longer implicitly partitioned, which is incorrect. see comment about CreatePartitioning below
Talked offline, adding SHOW ZONE CONFIGURATIONS (which will be re-written later) and a query on SHOW PARTITIONS.
pkg/ccl/multiregionccl/regional_by_row_test.go, line 372 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
do you mind extracting this cluster setup to a method? i have a feels we'll use this often.
I agree, done.
pkg/sql/planner.go, line 248 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
can you use the options pattern for this (as per our style guide), e.g. https://www.sohamkamani.com/golang/options-pattern/
Done, thanks for the suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 12 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, @lucy-zhang, @otan, and @postamar)
pkg/sql/planner.go, line 249 at r5 (raw file):
// WithDescCollection configures the planner with the provided collection // instead of the default (creating a new one from scratch). func WithDescCollection(collection *descs.Collection) InternalPlannerParamsOption {
i usually like prefixing the option like so: InternalPlannerParamsOption(With)DescCollection
so it's searchable easily with just the prefix (but it can be a bit long)
pkg/sql/planner.go, line 257 at r5 (raw file):
// NewInternalPlanner is an exported version of newInternalPlanner. It // returns an interface{} so it can be used outside of the sql package. // NewInternalPlanner initializes a new desc.Collection if an ovveride option
i would nix the new comment, or at the very least, fix the ovveride
typo :P
pkg/sql/planner.go, line 298 at r5 (raw file):
execCfg *ExecutorConfig, sessionData sessiondatapb.SessionData, params *InternalPlannerParams,
can we thread the options into here to avoid this new struct?
pkg/sql/values_test.go, line 59 at r5 (raw file):
sessiondatapb.SessionData{}, ) return p.(*planner)
nit: revert
pkg/sql/schemachanger/scbuild/builder_test.go, line 148 at r5 (raw file):
// way doesn't seem to do what we want. sessiondatapb.SessionData{}, nil, /* collection */
will this fail any tests? is the comment still right?
This patch adds functionality to repartition REGIONAL BY ROW tables when a new region is added to the database. An index can only be partitioned using an enum value if it is PUBLIC. Thus we only modify partition descriptors once all transitioning enum members have done so successfully. This happens in the same transaction that promoted the enum members, ensuring sane rollbacks in the face of non-retryable partitioning failure. Once partitioning is complete, zone configurations are applied to partitions as well. This patch changes how uncommitted type descriptors are stored to enable partitioning in the same transaction that previously promoted enum members. A column's types.T is hydrated using enum metadata fields that are cached on the type descriptor. Previously, this cache was constructed based on the cluster version of the type descriptor and not updated when the type descriptor was modified. After this patch, whenever a type descriptor is added back as an uncomitted descriptor to the desc collection, we reconstruct the cached fields stored on it. For us, this means that tables being partitioned are hydrated with the correct visibility for enum members, which allows us to partition on values promoted in the same transaction. Release note (sql change): ALTER DATABASE ... ADD REGION now repartitions REGIONAL BY ROW tables and updates the zone configs on the newly created partitions as well.
0f3767d
to
99ee0be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, @lucy-zhang, @otan, and @postamar)
pkg/sql/planner.go, line 249 at r5 (raw file):
Previously, otan (Oliver Tan) wrote…
i usually like prefixing the option like so:
InternalPlannerParamsOption(With)DescCollection
so it's searchable easily with just the prefix (but it can be a bit long)
IMO the readability of what I have rn outweighs the searchable aspect, unless you really insist.
pkg/sql/planner.go, line 257 at r5 (raw file):
Previously, otan (Oliver Tan) wrote…
i would nix the new comment, or at the very least, fix the
ovveride
typo :P
Done.
pkg/sql/planner.go, line 298 at r5 (raw file):
Previously, otan (Oliver Tan) wrote…
can we thread the options into here to avoid this new struct?
Talked offline, threaded the options but made the new struct internal as opposed to getting rid of it (in the hope that people might extend it in the future).
pkg/sql/values_test.go, line 59 at r5 (raw file):
Previously, otan (Oliver Tan) wrote…
nit: revert
I need this because I changed the call from newInternalPlanner
(which returns a planner) to NewInternalPlanner
(which returns an interface). Don't really need the switch now that Ive passed the options down, but now that I've cleaned up all accesses to newInternalPlanner
to go through the public method, Im inclined to just leave it as is.
pkg/sql/schemachanger/scbuild/builder_test.go, line 148 at r5 (raw file):
Previously, otan (Oliver Tan) wrote…
will this fail any tests? is the comment still right?
Detritus, fixed.
TFTRs! bors r=otan,ajstorm |
Build failed: |
bors r+ |
Build succeeded: |
his patch adds functionality to repartition REGIONAL BY ROW tables when
a new region is added to the database. An index can only be partitioned
using an enum value if it is PUBLIC. Thus we only modify partition
descriptors once all transitioning enum members have done so
successfully. This happens in the same transaction that promoted the
enum members, ensuring sane rollbacks in the face of non-retryable
partitioning failure. Once partitioning is complete, zone
configurations are applied to partitions as well.
This patch changes how uncommitted type descriptors are stored to enable
partitioning in the same transaction that previously promoted enum
members. A column's types.T is hydrated using enum metadata fields that
are cached on the type descriptor. Previously, this cache was
constructed based on the cluster version of the type descriptor and not
updated when the type descriptor was modified. After this patch,
whenever a type descriptor is added back as an uncomitted descriptor to
the desc collection, we reconstruct the cached fields stored on it.
For us, this means that tables being partitioned are hydrated with the
correct visibility for enum members, which allows us to partition on
values promoted in the same transaction.
Release note (sql change): ALTER DATABASE ... ADD REGION now
repartitions REGIONAL BY ROW tables and updates the zone configs on
the newly created partitions as well.