Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql, backupccl: backup and restore fk-upgrade-downgrade awareness #39474

Closed

Conversation

jordanlewis
Copy link
Member

@jordanlewis jordanlewis commented Aug 8, 2019

This commit has a solution for all 4 cases that you need to worry about
with regards to writing and reading table descriptors in a mixed
19.1/19.2 state.

We now:

  1. upgrade the descriptors from disk while creating a backup
  2. downgrade the descriptors from memory into the backup descriptor if
    we're in a 19.1/2 mixed state
  3. upgrade the descriptors from a backup while preparing to restore
  4. downgrade the descriptors from memory onto the disk while restoring
    a backup if we're in a 19.1/2 mixed state

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jordanlewis jordanlewis changed the title wip: backup and restore fk-upgrade-downgrade awareness backup and restore fk-upgrade-downgrade awareness Aug 10, 2019
@jordanlewis jordanlewis changed the title backup and restore fk-upgrade-downgrade awareness sqlccl: backup and restore fk-upgrade-downgrade awareness Aug 10, 2019
thoszhang added a commit to jordanlewis/cockroach that referenced this pull request Aug 12, 2019
This commit does the following salient things:

- change all in-memory foreign key operations to use the new foreign key
  representation, which is defined on table descriptor instead of index
  descriptor.
- change all reads of serialized table descriptor protos to
  unconditionally upgrade themselves from the old descriptor version
  into the new descriptor version.
- change all serialization paths of table descriptors to downgrade
  themselves from the new descriptor version to the old descriptor
  version *when the cluster version is less than 19.2*.

Note that this commit *is not intended to change* any actual foreign key
behavior, such as restrictions on dropping or flexibility in adding.

Note further that this commit puts the state of this branch into peril
with regards to backup and restore on mixed version clusters, which
won't work. This is a known issue and will be resolved before 19.2 is
released. See issue cockroachdb#39474.

There are a couple of other relevant technical details.

New-style foreign keys contain 4 special fields that are designed to
help with this upgrade.

Two are the "pinned" indexes that the foreign key is related to. These
will be the same as the indexes that the foreign keys used to live on in
19.1 and before, if the foreign key was upgraded from an old version, or
will be synthesized using the same algorithm as before if being created
in 19.2-mixed or later.

The other two are "validator" fields that are exact copies of the old
19.1 foreign key representation that a particular foreign key was
upgraded from. These will *also* be synthesized by the creation of new
foreign keys in 19.2-mixed. These fields are used to validate that we
didn't do anything wrong when downgrading a new-style foreign key in the
mixed version state.

---------------------------------------------------------------------

Here is the "state diagram" of how all of the above works:

*19.1*

- everybody writes old-style fks, nobody understands new-style fks.

*19.2-mixed*: a new 19.2 node enters the cluster, but the cluster
              version is not yet flipped.

- 19.2 nodes always write legacy idx fields when synthesizing table
  descriptors.
- 19.2 nodes always assume the existence of legacy idx fields in all
  table descriptors.
- 19.2 nodes unconditionally upgrade all table descriptors from disk
  into the new format, preserving both the legacy idx fields and a special
  "validator" struct that is the entirety of the old-style foreign key.
- 19.2 nodes unconditionally downgrade all in-memory table descriptors
  to the old format, making sure the downgrade was correct by exactly
  matching the result to the special "validator" struct above.
- 19.1 nodes continue on their merry way. No new-style descriptors are
  ever on disk, so they don't know anything is up.

*19.2-final*: all nodes are 19.2, and the upgrade switch is flipped.

- 19.2 nodes behave identically to the above, except:
- 19.2 nodes do not downgrade in-memory table descriptors to the old
  format.

Release note: None

Co-authored-by: Lucy Zhang <lucy@cockroachlabs.com>
Co-authored-by: Jordan Lewis <jordan@cockroachlabs.com>
@jordanlewis jordanlewis requested a review from a team as a code owner August 15, 2019 22:33
@thoszhang thoszhang force-pushed the fk-backup-restore branch 3 times, most recently from dddf698 to 4ad6ca8 Compare August 16, 2019 04:07
thoszhang added a commit to jordanlewis/cockroach that referenced this pull request Aug 16, 2019
This commit does the following salient things:

- change all in-memory foreign key operations to use the new foreign key
  representation, which is defined on table descriptor instead of index
  descriptor.
- change all reads of serialized table descriptor protos to
  unconditionally upgrade themselves from the old descriptor version
  into the new descriptor version.
- change all serialization paths of table descriptors to downgrade
  themselves from the new descriptor version to the old descriptor
  version *when the cluster version is less than 19.2*.

Note that this commit *is not intended to change* any actual foreign key
behavior, such as restrictions on dropping or flexibility in adding.

Note further that this commit puts the state of this branch into peril
with regards to backup and restore on mixed version clusters, which
won't work. This is a known issue and will be resolved before 19.2 is
released. See issue cockroachdb#39474.

There are a couple of other relevant technical details.

New-style foreign keys contain 4 special fields that are designed to
help with this upgrade.

Two are the "pinned" indexes that the foreign key is related to. These
will be the same as the indexes that the foreign keys used to live on in
19.1 and before, if the foreign key was upgraded from an old version, or
will be synthesized using the same algorithm as before if being created
in 19.2-mixed or later.

The other two are "validator" fields that are exact copies of the old
19.1 foreign key representation that a particular foreign key was
upgraded from. These will *also* be synthesized by the creation of new
foreign keys in 19.2-mixed. These fields are used to validate that we
didn't do anything wrong when downgrading a new-style foreign key in the
mixed version state.

---------------------------------------------------------------------

Here is the "state diagram" of how all of the above works:

*19.1*

- everybody writes old-style fks, nobody understands new-style fks.

*19.2-mixed*: a new 19.2 node enters the cluster, but the cluster
              version is not yet flipped.

- 19.2 nodes always write legacy idx fields when synthesizing table
  descriptors.
- 19.2 nodes always assume the existence of legacy idx fields in all
  table descriptors.
- 19.2 nodes unconditionally upgrade all table descriptors from disk
  into the new format, preserving both the legacy idx fields and a special
  "validator" struct that is the entirety of the old-style foreign key.
- 19.2 nodes unconditionally downgrade all in-memory table descriptors
  to the old format, making sure the downgrade was correct by exactly
  matching the result to the special "validator" struct above.
- 19.1 nodes continue on their merry way. No new-style descriptors are
  ever on disk, so they don't know anything is up.

*19.2-final*: all nodes are 19.2, and the upgrade switch is flipped.

- 19.2 nodes behave identically to the above, except:
- 19.2 nodes do not downgrade in-memory table descriptors to the old
  format.

Release note: None

Co-authored-by: Lucy Zhang <lucy@cockroachlabs.com>
Co-authored-by: Jordan Lewis <jordan@cockroachlabs.com>
thoszhang added a commit to jordanlewis/cockroach that referenced this pull request Aug 16, 2019
This commit does the following salient things:

- change all in-memory foreign key operations to use the new foreign key
  representation, which is defined on table descriptor instead of index
  descriptor.
- change all reads of serialized table descriptor protos to
  unconditionally upgrade themselves from the old descriptor version
  into the new descriptor version.
- change all serialization paths of table descriptors to downgrade
  themselves from the new descriptor version to the old descriptor
  version *when the cluster version is less than 19.2*.

Note that this commit *is not intended to change* any actual foreign key
behavior, such as restrictions on dropping or flexibility in adding.

Note further that this commit puts the state of this branch into peril
with regards to backup and restore on mixed version clusters, which
won't work. This is a known issue and will be resolved before 19.2 is
released. See issue cockroachdb#39474.

There are a couple of other relevant technical details.

New-style foreign keys contain 4 special fields that are designed to
help with this upgrade.

Two are the "pinned" indexes that the foreign key is related to. These
will be the same as the indexes that the foreign keys used to live on in
19.1 and before, if the foreign key was upgraded from an old version, or
will be synthesized using the same algorithm as before if being created
in 19.2-mixed or later.

The other two are "validator" fields that are exact copies of the old
19.1 foreign key representation that a particular foreign key was
upgraded from. These will *also* be synthesized by the creation of new
foreign keys in 19.2-mixed. These fields are used to validate that we
didn't do anything wrong when downgrading a new-style foreign key in the
mixed version state.

---------------------------------------------------------------------

Here is the "state diagram" of how all of the above works:

*19.1*

- everybody writes old-style fks, nobody understands new-style fks.

*19.2-mixed*: a new 19.2 node enters the cluster, but the cluster
              version is not yet flipped.

- 19.2 nodes always write legacy idx fields when synthesizing table
  descriptors.
- 19.2 nodes always assume the existence of legacy idx fields in all
  table descriptors.
- 19.2 nodes unconditionally upgrade all table descriptors from disk
  into the new format, preserving both the legacy idx fields and a special
  "validator" struct that is the entirety of the old-style foreign key.
- 19.2 nodes unconditionally downgrade all in-memory table descriptors
  to the old format, making sure the downgrade was correct by exactly
  matching the result to the special "validator" struct above.
- 19.1 nodes continue on their merry way. No new-style descriptors are
  ever on disk, so they don't know anything is up.

*19.2-final*: all nodes are 19.2, and the upgrade switch is flipped.

- 19.2 nodes behave identically to the above, except:
- 19.2 nodes do not downgrade in-memory table descriptors to the old
  format.

Release note: None

Co-authored-by: Lucy Zhang <lucy@cockroachlabs.com>
Co-authored-by: Jordan Lewis <jordan@cockroachlabs.com>
@thoszhang thoszhang removed their request for review August 16, 2019 20:31
@thoszhang thoszhang changed the title sqlccl: backup and restore fk-upgrade-downgrade awareness sql, backupccl: backup and restore fk-upgrade-downgrade awareness Aug 16, 2019
craig bot pushed a commit that referenced this pull request Aug 19, 2019
39383: sql: use new foreign key representation r=lucy-zhang a=jordanlewis

This PR does the following salient things:

- change all in-memory foreign key operations to use the new foreign key
  representation, which is defined on table descriptor instead of index
  descriptor.
- change all reads of serialized table descriptor protos to
  unconditionally upgrade themselves from the old descriptor version
  into the new descriptor version.
- change all serialization paths of table descriptors to downgrade
  themselves from the new descriptor version to the old descriptor
  version *when the cluster version is less than 19.2*.

Note that this commit *is not intended to change* any actual foreign key
behavior, such as restrictions on dropping or flexibility in adding.

Note further that this commit puts the state of this branch into peril
with regards to backup and restore on mixed version clusters, which
won't work. This is a known issue and will be resolved before 19.2 is
released. See issue #39474.

There are a couple of other relevant technical details.

New-style foreign keys contain 4 special fields that are designed to
help with this upgrade.

Two are the "pinned" indexes that the foreign key is related to. These
will be the same as the indexes that the foreign keys used to live on in
19.1 and before, if the foreign key was upgraded from an old version, or
will be synthesized using the same algorithm as before if being created
in 19.2-mixed or later.

The other two are "validator" fields that are exact copies of the old
19.1 foreign key representation that a particular foreign key was
upgraded from. These will *also* be synthesized by the creation of new
foreign keys in 19.2-mixed. These fields are used to validate that we
didn't do anything wrong when downgrading a new-style foreign key in the
mixed version state.

---------------------------------------------------------------------

Here is the "state diagram" of how all of the above works:

*19.1*

- everybody writes old-style fks, nobody understands new-style fks.

*19.2-mixed*: a new 19.2 node enters the cluster, but the cluster
              version is not yet flipped.

- 19.2 nodes always write legacy idx fields when synthesizing table
  descriptors.
- 19.2 nodes always assume the existence of legacy idx fields in all
  table descriptors.
- 19.2 nodes unconditionally upgrade all table descriptors from disk
  into the new format, preserving both the legacy idx fields and a special
  "validator" struct that is the entirety of the old-style foreign key.
- 19.2 nodes unconditionally downgrade all in-memory table descriptors
  to the old format, making sure the downgrade was correct by exactly
  matching the result to the special "validator" struct above.
- 19.1 nodes continue on their merry way. No new-style descriptors are
  ever on disk, so they don't know anything is up.

*19.2-final*: all nodes are 19.2, and the upgrade switch is flipped.

- 19.2 nodes behave identically to the above, except:
- 19.2 nodes do not downgrade in-memory table descriptors to the old
  format.

There's a second commit that adds mixed-version tests:

sql: add mixed-version logic tests for fk upgrade

This commit adds a new logic tests configuration that runs logic tests
in the mixed-version cluster configuration that will be visible during a
mixed 19.1/19.2 state. Specifically, this means that the "binary
version" will be 19.2 but the minimum cluster version will be 19.1 (even
though for the purposes of this logic test there will not be any actual
19.1 nodes anywhere).

There are a couple of "known issue" logic tests that are disabled /
hobbled in the fk and schema_change_in_txn logic test configurations,
which necessiated splitting these tests into two nearly identical
configs that are run with and without the mixed version configuration.

Pending the resolution of #39037, the auto-creation of origin indexes
for fks on empty tables doesn't work in mixed-version clusters.

Pending the resolution of #37712, validating a foreign key constraint
within a transaction containing another schema change on the same table
also doesn't work in mixed-version clusters.

Finally, there's also a small difference in the order of constraint
printing for show create table statements in mixed-version clusters.

Co-authored-by: Lucy Zhang <lucy@cockroachlabs.com>
Co-authored-by: Jordan Lewis <jordan@cockroachlabs.com>

Release note: None

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
This commit has a solution for all 4 cases that you need to worry about
with regards to writing and reading table descriptors in a mixed
19.1/19.2 state.

We now:

1. upgrade the descriptors from disk while creating a backup
2. downgrade the descriptors from memory into the backup descriptor if
   we're in a 19.1/2 mixed state
1. upgrade the descriptors from a backup while preparing to restore
4. downgrade the descriptors from memory onto the disk while restoring
   a backup if we're in a 19.1/2 mixed state

Release note: None

Co-authored-by: Lucy Zhang <lucy@cockroachlabs.com>
Co-authored-by: Jordan Lewis <jordan@cockroachlabs.com>
craig bot pushed a commit that referenced this pull request Aug 20, 2019
39757: backupccl: fix backup/restore after FK table descriptor changes r=lucy-zhang a=lucy-zhang

This PR fixes the backup/restore cluster version incompatibility issues
introduced by the FK table descriptor representation upgrade. It forces all
mixed-version clusters to write backup descriptors with table descriptors that
are compatible with 19.1, and enables 19.2 nodes to read 19.1 backups
correctly. This is done by downgrading table descriptors (conditionally, based
on the cluster version) every time they are written to disk or written to a
backup manifest, and upgrading them when FKs need to be read or written.

Backup descriptors are upgraded using a `protoGetter` that reads table
descriptors from the backup descriptors themselves, taking the place of `Txn`,
when resolving cross-table references. To handle backup descriptors containg
tables that reference tables not in the backup, a new argument is added to
`maybeUpgradeForeignKeyRepresentation` that allows for skipping foreign keys
during this lookup process that can't be restored because a table is missing.

I mostly tested this by running one of the failing tpch benchmarks
(`tpchbench/tpchVec/nodes=3/cpu=4/sf=1`) to verify that it actually works, and
restoring the tpch fixture in a real mixed-version cluster (with both 19.1 and
19.2 nodes as the gateway node) and a cluster that had been upgraded from 19.1
to 19.2. I also ran the `backupccl` tests while forcing the cluster to be on
cluster version 19.1 to simulate the mixed-version state:
```
diff --git a/pkg/sql/create_table.go b/pkg/sql/create_table.go
index 077f394442..a7549b0e2a 100644
--- a/pkg/sql/create_table.go
+++ b/pkg/sql/create_table.go
@@ -631,7 +631,7 @@ func ResolveFK(
                LegacyReferencedIndex: legacyReferencedIndexID,
        }

-       if !settings.Version.IsActive(cluster.VersionTopLevelForeignKeys) {
+       if !settings.Version.IsActive(cluster.VersionTopLevelForeignKeys) || true {
                legacyUpgradedFromOriginReference := sqlbase.ForeignKeyReference{
                        Table:           target.ID,
                        Index:           legacyReferencedIndexID,
diff --git a/pkg/sql/sqlbase/structured.go b/pkg/sql/sqlbase/structured.go
index 5b7b507633..afc7d75be5 100644
--- a/pkg/sql/sqlbase/structured.go
+++ b/pkg/sql/sqlbase/structured.go
@@ -988,7 +988,7 @@ func maybeUpgradeForeignKeyRepOnIndex(
 func (desc *TableDescriptor) MaybeDowngradeForeignKeyRepresentation(
        ctx context.Context, clusterSettings *cluster.Settings,
 ) (bool, *TableDescriptor, error) {
-       downgradeUnnecessary := clusterSettings.Version.IsActive(cluster.VersionTopLevelForeignKeys)
+       downgradeUnnecessary := clusterSettings.Version.IsActive(cluster.VersionTopLevelForeignKeys) && false
        if downgradeUnnecessary {
                return false, desc, nil
        }
```

This PR is based on #39474.
Fixes #39753.

Release note: None

Co-authored-by: Lucy Zhang <lucy-zhang@users.noreply.github.com>
@jordanlewis jordanlewis deleted the fk-backup-restore branch August 25, 2019 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants