Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: fix backup/restore after FK table descriptor changes #39757

Merged
merged 1 commit into from
Aug 20, 2019

Conversation

thoszhang
Copy link
Contributor

@thoszhang thoszhang commented Aug 20, 2019

This PR fixes the backup/restore cluster version incompatibility issues
introduced by the FK table descriptor representation upgrade. It forces all
mixed-version clusters to write backup descriptors with table descriptors that
are compatible with 19.1, and enables 19.2 nodes to read 19.1 backups
correctly. This is done by downgrading table descriptors (conditionally, based
on the cluster version) every time they are written to disk or written to a
backup manifest, and upgrading them when FKs need to be read or written.

Backup descriptors are upgraded using a protoGetter that reads table
descriptors from the backup descriptors themselves, taking the place of Txn,
when resolving cross-table references. To handle backup descriptors containg
tables that reference tables not in the backup, a new argument is added to
maybeUpgradeForeignKeyRepresentation that allows for skipping foreign keys
during this lookup process that can't be restored because a table is missing.

I mostly tested this by running one of the failing tpch benchmarks
(tpchbench/tpchVec/nodes=3/cpu=4/sf=1) to verify that it actually works, and
restoring the tpch fixture in a real mixed-version cluster (with both 19.1 and
19.2 nodes as the gateway node) and a cluster that had been upgraded from 19.1
to 19.2. I also ran the backupccl tests while forcing the cluster to be on
cluster version 19.1 to simulate the mixed-version state:

diff --git a/pkg/sql/create_table.go b/pkg/sql/create_table.go
index 077f394442..a7549b0e2a 100644
--- a/pkg/sql/create_table.go
+++ b/pkg/sql/create_table.go
@@ -631,7 +631,7 @@ func ResolveFK(
                LegacyReferencedIndex: legacyReferencedIndexID,
        }

-       if !settings.Version.IsActive(cluster.VersionTopLevelForeignKeys) {
+       if !settings.Version.IsActive(cluster.VersionTopLevelForeignKeys) || true {
                legacyUpgradedFromOriginReference := sqlbase.ForeignKeyReference{
                        Table:           target.ID,
                        Index:           legacyReferencedIndexID,
diff --git a/pkg/sql/sqlbase/structured.go b/pkg/sql/sqlbase/structured.go
index 5b7b507633..afc7d75be5 100644
--- a/pkg/sql/sqlbase/structured.go
+++ b/pkg/sql/sqlbase/structured.go
@@ -988,7 +988,7 @@ func maybeUpgradeForeignKeyRepOnIndex(
 func (desc *TableDescriptor) MaybeDowngradeForeignKeyRepresentation(
        ctx context.Context, clusterSettings *cluster.Settings,
 ) (bool, *TableDescriptor, error) {
-       downgradeUnnecessary := clusterSettings.Version.IsActive(cluster.VersionTopLevelForeignKeys)
+       downgradeUnnecessary := clusterSettings.Version.IsActive(cluster.VersionTopLevelForeignKeys) && false
        if downgradeUnnecessary {
                return false, desc, nil
        }

This PR is based on #39474.
Fixes #39753.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@thoszhang thoszhang force-pushed the backup-restore branch 2 times, most recently from 5a790f8 to 3947732 Compare August 20, 2019 02:41
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn’t need to block this PR, but given how important on-disk compatibility is, I want to bump the idea of having checked in testdata backups from prior releases, so that we can actually unit test that basic restore and sanity check works against past backups.

@thoszhang
Copy link
Contributor Author

I opened #39772 to track adding cross-version RESTORE tests.

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: - I have a bunch of comments but none of them blocking.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @lucy-zhang)


pkg/ccl/backupccl/backup.go, line 1044 at r1 (raw file):

			for i, uri := range incrementalFrom {
				// We don't read the table descriptors from the backup descriptor, so
				// there's no need to upgrade the table descriptors.

semi-nit: Could we try a little harder to guarantee that nobody reads those TDs as the code migrates? For example, you could nil the TableDescriptors field on this struct to make sure that if somebody gets confused they actually notice this message and reason about what's going on.


pkg/ccl/backupccl/backup.go, line 1279 at r1 (raw file):

	var checkpointDesc *BackupDescriptor
	// We don't read the table descriptors from the backup descriptor, so
	// there's no need to upgrade the table descriptors.

ditto above.


pkg/ccl/backupccl/backup.go, line 1487 at r1 (raw file):

}

func maybeDowngradeTableDescsInBackupDescriptor(

Could you please add a godoc to this method that explains the deal with the "maybe"?


pkg/ccl/backupccl/backup.go, line 1507 at r1 (raw file):

}

func maybeUpgradeTableDescsInBackupDescriptors(

ditto Godoc.


pkg/ccl/backupccl/restore.go, line 1648 at r1 (raw file):

	// writing old-style descs in RestoreDetails (unless a job persists across
	// an upgrade?).
	if err := maybeUpgradeTableDescsInBackupDescriptors(ctx, backupDescs, true /* skipFKsWithNoMatchingTable */); err != nil {

Are we supposed to unconditionally skip stuff here? I thought there was a setting about it.


pkg/ccl/backupccl/show.go, line 73 at r1 (raw file):

			return err
		}
		if err := maybeUpgradeTableDescsInBackupDescriptors(ctx, []BackupDescriptor{desc}, true /*skipFKsWithNoMatchingTable*/); err != nil {

Please add a comment why this is true - why do we get to skip non-matching tables here?


pkg/ccl/cliccl/load.go, line 65 at r1 (raw file):

	// fields are added to the output, the table descriptors may need to be
	// upgraded.
	desc, err := backupccl.ReadBackupDescriptorFromURI(ctx, basepath, cluster.NoSettings)

ditto comment about preventing people from reading TD's, unless they need to here?

This PR fixes the backup/restore cluster version incompatibility issues
introduced by the FK table descriptor representation upgrade. It forces all
mixed-version clusters to write backup descriptors with table descriptors that
are compatible with 19.1, and enables 19.2 nodes to read 19.1 backups
correctly. This is done by downgrading table descriptors (conditionally, based
on the cluster version) every time they are written to disk or written to a
backup manifest, and upgrading them when FKs need to be read or written.

Backup descriptors are upgraded using a `protoGetter` that reads table
descriptors from the backup descriptors themselves, taking the place of `Txn`,
when resolving cross-table references. To handle backup descriptors containg
tables that reference tables not in the backup, a new argument is added to
`maybeUpgradeForeignKeyRepresentation` that allows for skipping foreign keys
during this lookup process that can't be restored because a table is missing.

I mostly tested this by running one of the failing tpch benchmarks
(`tpchbench/tpchVec/nodes=3/cpu=4/sf=1`) to verify that it actually works, and
restoring the tpch fixture in a real mixed-version cluster (with both 19.1 and
19.2 nodes as the gateway node) and a cluster that had been upgraded from 19.1
to 19.2. I also ran the `backupccl` tests while forcing the cluster to be on
cluster version 19.1 to simulate the mixed-version state:
```
diff --git a/pkg/sql/create_table.go b/pkg/sql/create_table.go
index 077f394442..a7549b0e2a 100644
--- a/pkg/sql/create_table.go
+++ b/pkg/sql/create_table.go
@@ -631,7 +631,7 @@ func ResolveFK(
                LegacyReferencedIndex: legacyReferencedIndexID,
        }

-       if !settings.Version.IsActive(cluster.VersionTopLevelForeignKeys) {
+       if !settings.Version.IsActive(cluster.VersionTopLevelForeignKeys) || true {
                legacyUpgradedFromOriginReference := sqlbase.ForeignKeyReference{
                        Table:           target.ID,
                        Index:           legacyReferencedIndexID,
diff --git a/pkg/sql/sqlbase/structured.go b/pkg/sql/sqlbase/structured.go
index 5b7b507633..afc7d75be5 100644
--- a/pkg/sql/sqlbase/structured.go
+++ b/pkg/sql/sqlbase/structured.go
@@ -988,7 +988,7 @@ func maybeUpgradeForeignKeyRepOnIndex(
 func (desc *TableDescriptor) MaybeDowngradeForeignKeyRepresentation(
        ctx context.Context, clusterSettings *cluster.Settings,
 ) (bool, *TableDescriptor, error) {
-       downgradeUnnecessary := clusterSettings.Version.IsActive(cluster.VersionTopLevelForeignKeys)
+       downgradeUnnecessary := clusterSettings.Version.IsActive(cluster.VersionTopLevelForeignKeys) && false
        if downgradeUnnecessary {
                return false, desc, nil
        }
```

Release note: None
Copy link
Contributor Author

@thoszhang thoszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis)


pkg/ccl/backupccl/backup.go, line 1044 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

semi-nit: Could we try a little harder to guarantee that nobody reads those TDs as the code migrates? For example, you could nil the TableDescriptors field on this struct to make sure that if somebody gets confused they actually notice this message and reason about what's going on.

Okay, I realized that we do actually read the table descriptors since we need the table/index spans from past incremental backups, though we still don't read the FKs. This does work for now, but I think it will be safer for future code to just do the upgrade. I added a TODO for myself.


pkg/ccl/backupccl/backup.go, line 1279 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

ditto above.

This one is different, since we basically just read the backup descriptor to append more Files to it, ignoring the table descriptors. So we can't nil out the table descriptors, but I updated the comment to make it more clear.


pkg/ccl/backupccl/backup.go, line 1487 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Could you please add a godoc to this method that explains the deal with the "maybe"?

Done.


pkg/ccl/backupccl/backup.go, line 1507 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

ditto Godoc.

Done.


pkg/ccl/backupccl/restore.go, line 1648 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Are we supposed to unconditionally skip stuff here? I thought there was a setting about it.

We check that setting in doRestorePlan, which is sufficient. I expanded the doc comment for the function.


pkg/ccl/backupccl/show.go, line 73 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Please add a comment why this is true - why do we get to skip non-matching tables here?

Done.


pkg/ccl/cliccl/load.go, line 65 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

ditto comment about preventing people from reading TD's, unless they need to here?

We do read the table descriptors below, but only to get the IDs. I can add another comment where we actually extract the table descriptors individually (though the potential problem is more general than just the FK upgrade).

@thoszhang
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Aug 20, 2019
39757: backupccl: fix backup/restore after FK table descriptor changes r=lucy-zhang a=lucy-zhang

This PR fixes the backup/restore cluster version incompatibility issues
introduced by the FK table descriptor representation upgrade. It forces all
mixed-version clusters to write backup descriptors with table descriptors that
are compatible with 19.1, and enables 19.2 nodes to read 19.1 backups
correctly. This is done by downgrading table descriptors (conditionally, based
on the cluster version) every time they are written to disk or written to a
backup manifest, and upgrading them when FKs need to be read or written.

Backup descriptors are upgraded using a `protoGetter` that reads table
descriptors from the backup descriptors themselves, taking the place of `Txn`,
when resolving cross-table references. To handle backup descriptors containg
tables that reference tables not in the backup, a new argument is added to
`maybeUpgradeForeignKeyRepresentation` that allows for skipping foreign keys
during this lookup process that can't be restored because a table is missing.

I mostly tested this by running one of the failing tpch benchmarks
(`tpchbench/tpchVec/nodes=3/cpu=4/sf=1`) to verify that it actually works, and
restoring the tpch fixture in a real mixed-version cluster (with both 19.1 and
19.2 nodes as the gateway node) and a cluster that had been upgraded from 19.1
to 19.2. I also ran the `backupccl` tests while forcing the cluster to be on
cluster version 19.1 to simulate the mixed-version state:
```
diff --git a/pkg/sql/create_table.go b/pkg/sql/create_table.go
index 077f394442..a7549b0e2a 100644
--- a/pkg/sql/create_table.go
+++ b/pkg/sql/create_table.go
@@ -631,7 +631,7 @@ func ResolveFK(
                LegacyReferencedIndex: legacyReferencedIndexID,
        }

-       if !settings.Version.IsActive(cluster.VersionTopLevelForeignKeys) {
+       if !settings.Version.IsActive(cluster.VersionTopLevelForeignKeys) || true {
                legacyUpgradedFromOriginReference := sqlbase.ForeignKeyReference{
                        Table:           target.ID,
                        Index:           legacyReferencedIndexID,
diff --git a/pkg/sql/sqlbase/structured.go b/pkg/sql/sqlbase/structured.go
index 5b7b507633..afc7d75be5 100644
--- a/pkg/sql/sqlbase/structured.go
+++ b/pkg/sql/sqlbase/structured.go
@@ -988,7 +988,7 @@ func maybeUpgradeForeignKeyRepOnIndex(
 func (desc *TableDescriptor) MaybeDowngradeForeignKeyRepresentation(
        ctx context.Context, clusterSettings *cluster.Settings,
 ) (bool, *TableDescriptor, error) {
-       downgradeUnnecessary := clusterSettings.Version.IsActive(cluster.VersionTopLevelForeignKeys)
+       downgradeUnnecessary := clusterSettings.Version.IsActive(cluster.VersionTopLevelForeignKeys) && false
        if downgradeUnnecessary {
                return false, desc, nil
        }
```

This PR is based on #39474.
Fixes #39753.

Release note: None

Co-authored-by: Lucy Zhang <lucy-zhang@users.noreply.github.com>
@craig
Copy link
Contributor

craig bot commented Aug 20, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: restoring a 19.1 backup with FKs into master causes problems
5 participants