sql: stop duplicating list of regions on the database descriptor #60620

arulajmani · 2021-02-16T18:51:53Z

The multi-region type descriptor is the source of truth for available regions on a database. Currently, we duplicate the LogicalRepresentations from the type descriptor on to the database descriptor's region config. This wasn't problematic when enum promotion from READ_ONLY to PUBLIC couldn't fail, but with the support for dropping enum values in general and region removal specifically, this is no holds true. Currently, we have a rather delicate dance to keep the descriptors sane -- regions are added/removed in two steps, but the database descriptors are oblivious to this subtlety.

Consider the example of dropping regions from a database. Currently, we move the region on the type descriptor to READ ONLY mode and remove the region from the database descriptor. If the region removal validation fails, we must not only revert the enum member to PUBLIC, but also add the region back on to the database descriptor. This gets even more thorny to account for and test when these operations are performed inside transactions. It's also worth noting that even though we have validation to check our view of the world is sane, it doesn't help us in the face of an inconsistency bug -- it'll require descriptor surgery to make the database accessible again.

We should stop duplicating regions on the database descriptor. Instead, the multi-region enum should be the only source of truth on which regions are available to the database.

The text was updated successfully, but these errors were encountered:

ajwerner · 2021-02-17T00:02:10Z

@otan can you remind me why we wanted to do this in the first place?

otan · 2021-02-17T00:05:48Z

i don't quite remember; but i no longer have a good reason to say we should _{besides the fact unblocked a lot of work at the beginning}

ajstorm · 2021-03-05T15:30:29Z

Once this is complete, we need to address #61551 which pre-reqs the work here.

Previously, we were duplicating regions both on the type descriptor and the database descriptor. Theres a verbose explanation about why this is bad on the linked issue, but the crux of the problem with this approach is the added complexity it requires keeping these two lists in sync as regions are added/dropped from a database in the face of possible failure scenarios. This patch removes the `Regions` field from the database descriptors's region config. Instead, the regions stored on the multi-region enum serve as the sole source of truth. To accomplish this, we introduce a new `RegionConfig` struct which is synthesized from the state (primary region, survival goal) stored on the database and the state (available regions) stored on the type descriptor. All the places that constructed zone configurations from the raw region config proto now use this new struct. This patch also introduces a `InitializationRegionConfig` struct which wraps a `RegionConfig` with initilization specific meta-data. I'm not too thrilled by the fact that survival goal and primary region is still stored on the database descriptor. Ideally, the database descriptor would contain just the multi-region enum ID and all user configured parameters would be in one place (on the type descriptor's region config). The level of indirection introduced by this patch sets us on the path to that future. Closes cockroachdb#60620 Release justification: low risk updates to new functionality Release note: None

61585: sql: stop duplicating regions on the database descriptor r=arulajmani a=arulajmani Previously, we were duplicating regions both on the type descriptor and the database descriptor. Theres a verbose explanation about why this is bad on the linked issue, but the crux of the problem with this approach is the added complexity it requires keeping these two lists in sync as regions are added/dropped from a database in the face of possible failure scenarios. This patch removes the `Regions` field from the database descriptors's region config. Instead, the regions stored on the multi-region enum serve as the sole source of truth. To accomplish this, we introduce a new `RegionConfig` struct which is synthesized from the state (primary region, survival goal) stored on the database and the state (available regions) stored on the type descriptor. All the places that constructed zone configurations from the raw region config proto now use this new struct. This patch also introduces a `InitializationRegionConfig` struct which wraps a `RegionConfig` with initilization specific meta-data. I'm not too thrilled by the fact that survival goal and primary region is still stored on the database descriptor. Ideally, the database descriptor would contain just the multi-region enum ID and all user configured parameters would be in one place (on the type descriptor's region config). The level of indirection introduced by this patch sets us on the path to that future. Closes #60620 Release justification: low risk updates to new functionality Release note: None 61763: sql: disallow adding OIDVECTOR and INT2VECTOR columns r=mgartner a=mgartner Creating columns of type `OIDVECTOR` or `INT2VECTOR` is not allowed in `CREATE TABLE` statements. This commit disallows these types in `ALTER TABLE ... ADD COLUMN` statements for consistency. Fixes #61762 Release justification: This is a low-risk bug fix. Release note (bug fix): Adding columns of type `OIDVECTOR` or `INT2VECTOR` to a table in `ALTER TABLE ... ADD COLUMN` statements is no longer allowed. These types are not allowed in user-created tables via `CREATE TABLE ` and were erroneously allowed previously in `ALTER TABLE ... ADD COLUMN`. 62095: tracing: lock around testing knobs r=irfansharif a=irfansharif Fixes #61393. There was a race condition in our tests by not doing so. Release note: None Co-authored-by: arulajmani <arulajmani@gmail.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>

Previously, we were duplicating regions both on the type descriptor and the database descriptor. Theres a verbose explanation about why this is bad on the linked issue, but the crux of the problem with this approach is the added complexity it requires keeping these two lists in sync as regions are added/dropped from a database in the face of possible failure scenarios. This patch removes the `Regions` field from the database descriptors's region config. Instead, the regions stored on the multi-region enum serve as the sole source of truth. To accomplish this, we introduce a new `RegionConfig` struct which is synthesized from the state (primary region, survival goal) stored on the database and the state (available regions) stored on the type descriptor. All the places that constructed zone configurations from the raw region config proto now use this new struct. This patch also introduces a `InitializationRegionConfig` struct which wraps a `RegionConfig` with initilization specific meta-data. I'm not too thrilled by the fact that survival goal and primary region is still stored on the database descriptor. Ideally, the database descriptor would contain just the multi-region enum ID and all user configured parameters would be in one place (on the type descriptor's region config). The level of indirection introduced by this patch sets us on the path to that future. Closes cockroachdb#60620 Release justification: low risk updates to new functionality Release note: None

This patch generalizes the setup in what was previously `TestConcurrentDropRegion` and extends it to all combinations of add/drop region on a multi-region database. The only change is that I've added a regional by row table into the test setup mixer, so as to excercise the repartitioning semantics. Previously, there was a limitation with concurrent add/drop regions where both the operations were bound to fail in the repartitioning phase. This limitation was fixed in cockroachdb#60620, but we never had a regression test for it. Adding a regional by row table during the test setup serves as one. Closes cockroachdb#62813 Release note: None

61600: kvserver: make the StoreRebalancer aware of non-voters r=aayushshah15 a=aayushshah15 This commit teaches the `StoreRebalancer` to rebalance non-voting replicas. Release justification: needed for non-voting replicas Release note: None 62361: roachtest: attempt to handle VM overload under tpccbench r=irfansharif a=tbg See #62039. `tpccbench`, by design, pushes CRDB into overload territory. The test harness handles nodes crashing or tpmc tanking well. However, it was not prepared to handle the cloud VMs going unresponsive for ~minutes, which is one common failure mode. This commit tweaks the line search to be resilient to failures to communicate with the cloud VM in the one place where it matters (stopping the cluster at the beginning of a new search attempt). The hope is that this will allow the search to run to completion, even in the face of overload-imposed temporary VM outages. It is not expected to do this reliably, but at least anecdotally most VMs seem to come back a few minutes in. Release note: None 62826: sql: add tests for concurrent add/drop region operations r=ajstorm a=arulajmani This patch generalizes the setup in what was previously `TestConcurrentDropRegion` and extends it to all combinations of add/drop region on a multi-region database. The only change is that I've added a regional by row table into the test setup mixer, so as to excercise the repartitioning semantics. Previously, there was a limitation with concurrent add/drop regions where both the operations were bound to fail in the repartitioning phase. This limitation was fixed in #60620, but we never had a regression test for it. Adding a regional by row table during the test setup serves as one. Closes #62813 Release note: None Co-authored-by: Aayush Shah <aayush.shah15@gmail.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: arulajmani <arulajmani@gmail.com>

This patch generalizes the setup in what was previously `TestConcurrentDropRegion` and extends it to all combinations of add/drop region on a multi-region database. The only change is that I've added a regional by row table into the test setup mixer, so as to excercise the repartitioning semantics. Previously, there was a limitation with concurrent add/drop regions where both the operations were bound to fail in the repartitioning phase. This limitation was fixed in cockroachdb#60620, but we never had a regression test for it. Adding a regional by row table during the test setup serves as one. Closes cockroachdb#62813 Release note: None

arulajmani added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. labels Feb 16, 2021

arulajmani self-assigned this Feb 16, 2021

ajstorm mentioned this issue Mar 5, 2021

sql: Remove the need for FORCE in transactions with multiple multi-region operations #61551

Closed

arulajmani mentioned this issue Mar 6, 2021

sql: stop duplicating regions on the database descriptor #61585

Merged

ajstorm added branch-release-21.1 GA-blocker labels Mar 15, 2021

craig bot closed this as completed in 453a13a Mar 17, 2021

arulajmani mentioned this issue Mar 18, 2021

release-21.1: sql: stop duplicating regions on the database descriptor #62200

Merged

arulajmani mentioned this issue Mar 30, 2021

sql: add tests for concurrent add/drop region operations #62826

Merged

arulajmani mentioned this issue Apr 2, 2021

release-21.1: sql: add tests for concurrent add/drop region operations #63028

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: stop duplicating list of regions on the database descriptor #60620

sql: stop duplicating list of regions on the database descriptor #60620

arulajmani commented Feb 16, 2021

ajwerner commented Feb 17, 2021

otan commented Feb 17, 2021 •

edited

Loading

ajstorm commented Mar 5, 2021

sql: stop duplicating list of regions on the database descriptor #60620

sql: stop duplicating list of regions on the database descriptor #60620

Comments

arulajmani commented Feb 16, 2021

ajwerner commented Feb 17, 2021

otan commented Feb 17, 2021 • edited Loading

ajstorm commented Mar 5, 2021

otan commented Feb 17, 2021 •

edited

Loading