Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: stop duplicating list of regions on the database descriptor #60620

Closed
arulajmani opened this issue Feb 16, 2021 · 3 comments · Fixed by #61585
Closed

sql: stop duplicating list of regions on the database descriptor #60620

arulajmani opened this issue Feb 16, 2021 · 3 comments · Fixed by #61585
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. GA-blocker

Comments

@arulajmani
Copy link
Collaborator

The multi-region type descriptor is the source of truth for available regions on a database. Currently, we duplicate the LogicalRepresentations from the type descriptor on to the database descriptor's region config. This wasn't problematic when enum promotion from READ_ONLY to PUBLIC couldn't fail, but with the support for dropping enum values in general and region removal specifically, this is no holds true. Currently, we have a rather delicate dance to keep the descriptors sane -- regions are added/removed in two steps, but the database descriptors are oblivious to this subtlety.

Consider the example of dropping regions from a database. Currently, we move the region on the type descriptor to READ ONLY mode and remove the region from the database descriptor. If the region removal validation fails, we must not only revert the enum member to PUBLIC, but also add the region back on to the database descriptor. This gets even more thorny to account for and test when these operations are performed inside transactions. It's also worth noting that even though we have validation to check our view of the world is sane, it doesn't help us in the face of an inconsistency bug -- it'll require descriptor surgery to make the database accessible again.

We should stop duplicating regions on the database descriptor. Instead, the multi-region enum should be the only source of truth on which regions are available to the database.

@arulajmani arulajmani added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. labels Feb 16, 2021
@arulajmani arulajmani self-assigned this Feb 16, 2021
@ajwerner
Copy link
Contributor

@otan can you remind me why we wanted to do this in the first place?

@otan
Copy link
Contributor

otan commented Feb 17, 2021

i don't quite remember; but i no longer have a good reason to say we should besides the fact unblocked a lot of work at the beginning

@ajstorm
Copy link
Collaborator

ajstorm commented Mar 5, 2021

Once this is complete, we need to address #61551 which pre-reqs the work here.

arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 7, 2021
Previously, we were duplicating regions both on the type descriptor and
the database descriptor. Theres a verbose explanation about why this is
bad on the linked issue, but the crux of the problem with this approach
is the added complexity it requires keeping these two lists in sync as
regions are added/dropped from a database in the face of possible
failure scenarios.

This patch removes the `Regions` field from the database descriptors's
region config. Instead, the regions stored on the multi-region enum
serve as the sole source of truth. To accomplish this, we introduce a
new `RegionConfig` struct which is synthesized from the state (primary
region, survival goal) stored on the database and the state (available
regions) stored on the type descriptor. All the places that constructed
zone configurations from the raw region config proto now use this new
struct. This patch also introduces a `InitializationRegionConfig`
struct which wraps a `RegionConfig` with initilization specific
meta-data.

I'm not too thrilled by the fact that survival goal and primary region
is still stored on the database descriptor. Ideally, the database
descriptor would contain just the multi-region enum ID and all user
configured parameters would be in one place (on the type descriptor's
region config). The level of indirection introduced by this patch sets
us on the path to that future.

Closes cockroachdb#60620

Release justification: low risk updates to new functionality
Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 9, 2021
Previously, we were duplicating regions both on the type descriptor and
the database descriptor. Theres a verbose explanation about why this is
bad on the linked issue, but the crux of the problem with this approach
is the added complexity it requires keeping these two lists in sync as
regions are added/dropped from a database in the face of possible
failure scenarios.

This patch removes the `Regions` field from the database descriptors's
region config. Instead, the regions stored on the multi-region enum
serve as the sole source of truth. To accomplish this, we introduce a
new `RegionConfig` struct which is synthesized from the state (primary
region, survival goal) stored on the database and the state (available
regions) stored on the type descriptor. All the places that constructed
zone configurations from the raw region config proto now use this new
struct. This patch also introduces a `InitializationRegionConfig`
struct which wraps a `RegionConfig` with initilization specific
meta-data.

I'm not too thrilled by the fact that survival goal and primary region
is still stored on the database descriptor. Ideally, the database
descriptor would contain just the multi-region enum ID and all user
configured parameters would be in one place (on the type descriptor's
region config). The level of indirection introduced by this patch sets
us on the path to that future.

Closes cockroachdb#60620

Release justification: low risk updates to new functionality
Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 16, 2021
Previously, we were duplicating regions both on the type descriptor and
the database descriptor. Theres a verbose explanation about why this is
bad on the linked issue, but the crux of the problem with this approach
is the added complexity it requires keeping these two lists in sync as
regions are added/dropped from a database in the face of possible
failure scenarios.

This patch removes the `Regions` field from the database descriptors's
region config. Instead, the regions stored on the multi-region enum
serve as the sole source of truth. To accomplish this, we introduce a
new `RegionConfig` struct which is synthesized from the state (primary
region, survival goal) stored on the database and the state (available
regions) stored on the type descriptor. All the places that constructed
zone configurations from the raw region config proto now use this new
struct. This patch also introduces a `InitializationRegionConfig`
struct which wraps a `RegionConfig` with initilization specific
meta-data.

I'm not too thrilled by the fact that survival goal and primary region
is still stored on the database descriptor. Ideally, the database
descriptor would contain just the multi-region enum ID and all user
configured parameters would be in one place (on the type descriptor's
region config). The level of indirection introduced by this patch sets
us on the path to that future.

Closes cockroachdb#60620

Release justification: low risk updates to new functionality
Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 17, 2021
Previously, we were duplicating regions both on the type descriptor and
the database descriptor. Theres a verbose explanation about why this is
bad on the linked issue, but the crux of the problem with this approach
is the added complexity it requires keeping these two lists in sync as
regions are added/dropped from a database in the face of possible
failure scenarios.

This patch removes the `Regions` field from the database descriptors's
region config. Instead, the regions stored on the multi-region enum
serve as the sole source of truth. To accomplish this, we introduce a
new `RegionConfig` struct which is synthesized from the state (primary
region, survival goal) stored on the database and the state (available
regions) stored on the type descriptor. All the places that constructed
zone configurations from the raw region config proto now use this new
struct. This patch also introduces a `InitializationRegionConfig`
struct which wraps a `RegionConfig` with initilization specific
meta-data.

I'm not too thrilled by the fact that survival goal and primary region
is still stored on the database descriptor. Ideally, the database
descriptor would contain just the multi-region enum ID and all user
configured parameters would be in one place (on the type descriptor's
region config). The level of indirection introduced by this patch sets
us on the path to that future.

Closes cockroachdb#60620

Release justification: low risk updates to new functionality
Release note: None
craig bot pushed a commit that referenced this issue Mar 17, 2021
61585: sql: stop duplicating regions on the database descriptor r=arulajmani a=arulajmani

Previously, we were duplicating regions both on the type descriptor and
the database descriptor. Theres a verbose explanation about why this is
bad on the linked issue, but the crux of the problem with this approach
is the added complexity it requires keeping these two lists in sync as
regions are added/dropped from a database in the face of possible
failure scenarios.

This patch removes the `Regions` field from the database descriptors's
region config. Instead, the regions stored on the multi-region enum
serve as the sole source of truth. To accomplish this, we introduce a
new `RegionConfig` struct which is synthesized from the state (primary
region, survival goal) stored on the database and the state (available
regions) stored on the type descriptor. All the places that constructed
zone configurations from the raw region config proto now use this new
struct. This patch also introduces a `InitializationRegionConfig`
struct which wraps a `RegionConfig` with initilization specific
meta-data.

I'm not too thrilled by the fact that survival goal and primary region
is still stored on the database descriptor. Ideally, the database
descriptor would contain just the multi-region enum ID and all user
configured parameters would be in one place (on the type descriptor's
region config). The level of indirection introduced by this patch sets
us on the path to that future.

Closes #60620

Release justification: low risk updates to new functionality
Release note: None

61763: sql: disallow adding OIDVECTOR and INT2VECTOR columns r=mgartner a=mgartner

Creating columns of type `OIDVECTOR` or `INT2VECTOR` is not allowed in
`CREATE TABLE` statements. This commit disallows these types in
`ALTER TABLE ... ADD COLUMN` statements for consistency.

Fixes #61762

Release justification: This is a low-risk bug fix.

Release note (bug fix): Adding columns of type `OIDVECTOR` or
`INT2VECTOR` to a table in `ALTER TABLE ... ADD COLUMN` statements is no
longer allowed. These types are not allowed in user-created tables via
`CREATE TABLE ` and were erroneously allowed previously in
`ALTER TABLE ... ADD COLUMN`.

62095: tracing: lock around testing knobs r=irfansharif a=irfansharif

Fixes #61393. There was a race condition in our tests by not doing so.

Release note: None

Co-authored-by: arulajmani <arulajmani@gmail.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
@craig craig bot closed this as completed in 453a13a Mar 17, 2021
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 18, 2021
Previously, we were duplicating regions both on the type descriptor and
the database descriptor. Theres a verbose explanation about why this is
bad on the linked issue, but the crux of the problem with this approach
is the added complexity it requires keeping these two lists in sync as
regions are added/dropped from a database in the face of possible
failure scenarios.

This patch removes the `Regions` field from the database descriptors's
region config. Instead, the regions stored on the multi-region enum
serve as the sole source of truth. To accomplish this, we introduce a
new `RegionConfig` struct which is synthesized from the state (primary
region, survival goal) stored on the database and the state (available
regions) stored on the type descriptor. All the places that constructed
zone configurations from the raw region config proto now use this new
struct. This patch also introduces a `InitializationRegionConfig`
struct which wraps a `RegionConfig` with initilization specific
meta-data.

I'm not too thrilled by the fact that survival goal and primary region
is still stored on the database descriptor. Ideally, the database
descriptor would contain just the multi-region enum ID and all user
configured parameters would be in one place (on the type descriptor's
region config). The level of indirection introduced by this patch sets
us on the path to that future.

Closes cockroachdb#60620

Release justification: low risk updates to new functionality
Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 30, 2021
This patch generalizes the setup in what was previously
`TestConcurrentDropRegion` and extends it to all combinations of
add/drop region on a multi-region database. The only change is that
I've added a regional by row table into the test setup mixer, so as to
excercise the repartitioning semantics.

Previously, there was a limitation with concurrent add/drop regions
where both the operations were bound to fail in the repartitioning
phase. This limitation was fixed in cockroachdb#60620, but we never had a
regression test for it. Adding a regional by row table during the
test setup serves as one.

Closes cockroachdb#62813

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Apr 1, 2021
This patch generalizes the setup in what was previously
`TestConcurrentDropRegion` and extends it to all combinations of
add/drop region on a multi-region database. The only change is that
I've added a regional by row table into the test setup mixer, so as to
excercise the repartitioning semantics.

Previously, there was a limitation with concurrent add/drop regions
where both the operations were bound to fail in the repartitioning
phase. This limitation was fixed in cockroachdb#60620, but we never had a
regression test for it. Adding a regional by row table during the
test setup serves as one.

Closes cockroachdb#62813

Release note: None
craig bot pushed a commit that referenced this issue Apr 1, 2021
61600: kvserver: make the StoreRebalancer aware of non-voters r=aayushshah15 a=aayushshah15

This commit teaches the `StoreRebalancer` to rebalance non-voting
replicas.

Release justification: needed for non-voting replicas
Release note: None

62361: roachtest: attempt to handle VM overload under tpccbench r=irfansharif a=tbg

See #62039.

`tpccbench`, by design, pushes CRDB into overload territory. The test
harness handles nodes crashing or tpmc tanking well. However, it was
not prepared to handle the cloud VMs going unresponsive for ~minutes,
which is one common failure mode.

This commit tweaks the line search to be resilient to failures to
communicate with the cloud VM in the one place where it matters
(stopping the cluster at the beginning of a new search attempt).

The hope is that this will allow the search to run to completion,
even in the face of overload-imposed temporary VM outages. It is
not expected to do this reliably, but at least anecdotally most
VMs seem to come back a few minutes in.

Release note: None


62826: sql: add tests for concurrent add/drop region operations r=ajstorm a=arulajmani

This patch generalizes the setup in what was previously
`TestConcurrentDropRegion` and extends it to all combinations of
add/drop region on a multi-region database. The only change is that
I've added a regional by row table into the test setup mixer, so as to
excercise the repartitioning semantics.

Previously, there was a limitation with concurrent add/drop regions
where both the operations were bound to fail in the repartitioning
phase. This limitation was fixed in #60620, but we never had a
regression test for it. Adding a regional by row table during the
test setup serves as one.

Closes #62813

Release note: None

Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: arulajmani <arulajmani@gmail.com>
arulajmani added a commit to arulajmani/cockroach that referenced this issue Apr 2, 2021
This patch generalizes the setup in what was previously
`TestConcurrentDropRegion` and extends it to all combinations of
add/drop region on a multi-region database. The only change is that
I've added a regional by row table into the test setup mixer, so as to
excercise the repartitioning semantics.

Previously, there was a limitation with concurrent add/drop regions
where both the operations were bound to fail in the repartitioning
phase. This limitation was fixed in cockroachdb#60620, but we never had a
regression test for it. Adding a regional by row table during the
test setup serves as one.

Closes cockroachdb#62813

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. GA-blocker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants