Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Downgrade testing for backups #9501

Merged
merged 22 commits into from
Jan 27, 2022

Conversation

frouioui
Copy link
Member

@frouioui frouioui commented Jan 11, 2022

Description

This pull request adds the upgrade downgrade tests for the backup and restore functionalities of Vitess. The goal of the upgrade downgrade tests is to make sure Vitess functionalities continue working when certain components have different versions (version n for the current commit, and version n-1 for the previous release of Vitess).

Two new workflows are added:

  • Upgrade Downgrade Testing - Backups - E2E uses the already-existing end-to-end tests
  • Upgrade Downgrade Testing - Backups - Manual does the backup and restore while downgrading and upgrading VTTablet manually on a live cluster. The steps are detailed in this comment)

Upgrade Downgrade Testing - Backups - E2E

This test runs the end-to-end tests contained in packages go/test/endtoend/backup/vtbackup & go/test/endtoend/backup/transform. It runs once with vttablet n-1 and vtbackup n, then runs a second time with vttablet n and vtbackup n-1.

Upgrade Downgrade Testing - Backups - Manual

Setup

  1. The test creates a sharded cluster with every component using version n.
  2. It inserts some data in every table.
  3. Proceeds to backup the whole cluster.
  4. And then inserts one row in every table. The purpose is to make sure that when we restore the backup the new rows we inserted are not present in the tables.

Downgrade (n-1 restore from n)

  1. After the setup, the test shutdowns all vttablets and make sure to remove the tablet directories from vtdataroot.
  2. Then it re-creates the tablets using the version n-1 of vttablet. The tablets restore from the backup we made.
  3. The test asserts that the correct number of rows are present in every table using a select count().
  4. The test inserts one more row in every table.
  5. It proceeds to backup the whole cluster again.
  6. Then the test inserts one more row in every table again, these rows should not be present the next time we restore from the backup we just made.

Upgrade (n restore from n-1)

  1. Once this is done, the test shutdowns all the vttablets. Again, it makes sure to remove the tablet directories from vttdataroot.
  2. New vttablets are created by the test using the version n of vttablet. The tablets restore from the backup we made using vttablet n-1 in the Downgrade phase.
  3. The test then asserts that the number of rows in every table is correct.

Related Issue(s)

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

@frouioui frouioui force-pushed the upgrade-downgrade-backup branch 4 times, most recently from 36f6de6 to 56a94b4 Compare January 12, 2022 14:06
@frouioui frouioui force-pushed the upgrade-downgrade-backup branch 3 times, most recently from 3dfa9d0 to 4e03ddf Compare January 24, 2022 16:47
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
…flow

Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
…ngrade

Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
@frouioui frouioui force-pushed the upgrade-downgrade-backup branch 2 times, most recently from 9611b41 to 5c3ed49 Compare January 25, 2022 09:08
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
…ne in bash

Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
@frouioui frouioui marked this pull request as ready for review January 26, 2022 14:13
@frouioui frouioui requested review from deepthi and GuptaManan100 and removed request for rohit-nayak-ps January 26, 2022 14:13
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very good work.
I'm just wondering if the cluster we are creating needs to be this complicated. We are running MoveTables and Reshard on that cluster before we even start the backup flow.
Can we not simply create the sharded cluster we want without going through all these workflows? A problem in any of them can fail the upgrade/downgrade test (false positive).
I'm willing to consider having this merged as-is if you can simplify it later.

examples/local/backups/take_backups.sh Outdated Show resolved Hide resolved
Copy link
Member

@GuptaManan100 GuptaManan100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the requested changes, everything else looks good to me!

Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
@frouioui frouioui merged commit 3a3feb3 into vitessio:main Jan 27, 2022
@frouioui frouioui deleted the upgrade-downgrade-backup branch January 27, 2022 15:13
@frouioui
Copy link
Member Author

@deepthi @GuptaManan100 - Thank you both for the review. Actually @deepthi you are right, the setup of the cluster was inefficient, via commit da541cd I simplified the whole process, going from ~2-5 mins to <1 min to set up the cluster and with a process that is less prone to errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants