Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online DDL: evaluate REVERTibility of a migration #9232

Merged
merged 33 commits into from
Dec 9, 2021

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented Nov 15, 2021

Description

In this WIP PR we evaluate whether a migration is REVERTible. A migration may, for example, possibly be irrevertible if:

  • It drops a unique constraint (what happens if a user INSERTs a new row on the new table, which would create a conflict in the old schema?)
  • Id expands a data type (e.g. INT to BIGINT; what happens if a user INSERTs a value greater than INT on the new table? This value cannot be propagated back to the old table)
  • dropping a NOT NULL column with no DEFAULT value, then INSERTing any row on the new table. How can we populate the dropped column on the old table?
  • other?

We cannot say for certain that a migration is not revertible. It depends on how the user uses the new table in the interim.
However, assuming we cover all the cases, we can predict that a migration is in fact revertible. If we see nothing to block revertibility, then we have high confidence that we will make it.

As a side note, most cases illustrated above are IMO unlikely. It's unlikely that immediately after cut-over the user will suddenly populate a BIGINT value: their app needs to first realize it can populate such values. But of course we can't say for sure. We cannot predict an app's behavior in the generic case.

Related Issue(s)

Followup to #8495
Tracking: #6926

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

…s ewhen computing shared unique key or columns-covered unique key

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving release notes labels Nov 15, 2021
@shlomi-noach shlomi-noach self-assigned this Nov 15, 2021
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Nov 16, 2021

Examples of expanded data types, which we need to check for revertibility:

  • Any NOT NULL to NULLable
  • varchar(10)->varchar(20)
  • expansion/change of character set (e.g. going utf8 to utf8mb4)
  • Any increase in INT type (eg INT->BIGINT)
  • Any unsigned to signed integer (because -1 is an overflow)
  • Some signed to unsigned, where new datatype equals or greater than previous datatype
    • E.g. INT SIGNED to INT UNSIGNED, INT SIGNED to BIGINT UNSIGNED
    • But no problem with BIGINT SIGNED to INT UNSIGNED
  • Increase in BIT length
  • FLOAT to DOUBLE
  • Integer to FLOAT/DOUBLE
  • Increase in DECIMAL length
  • DATE->DATETIME, TIME->DATETIME
  • DATE->TIMESTAMP, TIME->TIMESTAMP
  • TIMESTAMP->DATE (because DATE can be before 1970-01-01)
  • TIMESTAMP->DATETIME
  • Temporal without fraction to temporal with fraction, or increase in fraction (e.g. TIMESTAMP to TIMESTAMP(3), TIMESTAMP(3) to TIMESTAMP(6))
  • Adding new ENUM/SET value

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…lt NULL?

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…ropped non-generated columns. Added unit test

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…rations

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

It occurs to me that a user may change a column's type from anything to anything, really. e.g. INT to VARCHAR, CHAR to DATETIME etc. We can't (won't?) review all possible transformations, just the most sensible ones (between numerics, between texts, etc.)

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…changes, that may affect revertibility

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@@ -0,0 +1,161 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was refactored out of vrepl.go without code changes

@@ -0,0 +1,192 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was refactored out of vrepl.go with changes as noted below.

}

// GetSharedColumns returns the intersection of two lists of columns in same order as the first list
func GetSharedColumns(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is unchanged, refactored out of vrepl.go

}

// isExpandedColumn sees if target column has any value set/range that is impossible in source column. See GetExpandedColumns comment for examples
func isExpandedColumn(sourceColumn *Column, targetColumn *Column) (bool, string) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New functionality, and at the heart of this PR

// - BIGINT UNSIGNED -> INT SIGNED (negative values)
// - TIMESTAMP -> TIMESTAMP(3)
// etc.
func GetExpandedColumnNames(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new functionality

}

// GetNoDefaultColumnNames returns names of columns which have no default value, out of given list of columns
func GetNoDefaultColumnNames(columns *ColumnList) (names []string) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new functionality. Though small, it's at the heart of this PR

alterSchemaMigrationsTableRemovedUniqueNames,
alterSchemaMigrationsTableRemovedNoDefaultColNames,
alterSchemaMigrationsTableExpandedColNames,
alterSchemaMigrationsTableRevertibleNotes,
Copy link
Contributor Author

@shlomi-noach shlomi-noach Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new columns in schema_migrations table help us to understand better the nature of the schema change, and in particular inform us what may be blockers/issues to reverting the migration.

alterSchemaMigrationsTableRemovedUniqueNames = "ALTER TABLE _vt.schema_migrations add column removed_unique_key_names text NOT NULL"
alterSchemaMigrationsTableRemovedNoDefaultColNames = "ALTER TABLE _vt.schema_migrations add column dropped_no_default_column_names text NOT NULL"
alterSchemaMigrationsTableExpandedColNames = "ALTER TABLE _vt.schema_migrations add column expanded_column_names text NOT NULL"
alterSchemaMigrationsTableRevertibleNotes = "ALTER TABLE _vt.schema_migrations add column revertible_notes text NOT NULL"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new columns in schema_migrations table help us to understand better the nature of the schema change, and in particular inform us what may be blockers/issues to reverting the migration.

@shlomi-noach
Copy link
Contributor Author

Ready for review!

Notes notes for the reviewers:

  • Added endtoend tests in go/test/endtoend/onlineddl/revertible/onlineddl_revertible_test.go.
    This is a suite of tests. Each test has two table definitions: "from" and "to", and runs an actual migrations between the two table. We then validate the analysis of the change: which unique constraints were dropped, which columns dropped which had no default value, which column changed type into a borader scope.

  • go/vt/vttablet/onlineddl/vrepl/columns.go refactored out side of vrepl.go, and then also added the functions isExpandedColumn, GetExpandedColumnNames

  • go/vt/vttablet/onlineddl/vrepl/unique_key.go refactored out side of vrepl.go without code changes

  • likewise, tests were refactored outside vrepl_test.go.

@shlomi-noach shlomi-noach marked this pull request as ready for review November 22, 2021 07:22
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nicely written PR (as usual). The online DDL stuff blows my mind.
A few questions/clarification requests inline.

@@ -54,7 +54,7 @@ var (
createTableRegexp = regexp.MustCompile(`(?s)(?i)(CREATE\s+TABLE\s+)` + "`" + `([^` + "`" + `]+)` + "`" + `(\s*[(].*$)`)
revertStatementRegexp = regexp.MustCompile(`(?i)^revert\s+([\S]*)$`)

enumValuesRegexp = regexp.MustCompile("(?i)^enum[(](.*)[)]$")
enumValuesRegexp = regexp.MustCompile("(?i)^(enum|set)[(](.*)[)]$")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change affect anything besides online DDL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking.

go/vt/schema/parser.go Outdated Show resolved Hide resolved
if err := e.updateSchemaAnalysis(ctx, onlineDDL.UUID,
len(v.addedUniqueKeys),
len(v.removedUniqueKeys),
strings.Join(removedUniqueKeyNames, ","),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to concatenate like this? MySQL lets you use commas in column names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right -- but this is going to be informational only. That is, we're not going to then re-parse this. This is about us being able to tell the user "there is a chance we can't revert this migration due to these columns: a,b,c". So in a way, I consider this to be free text.
However, you're right and why no make the extra effort of qualifying the names. I'll proceed to do that.

Comment on lines 2 to 5
Copyright 2016 GitHub Inc.
See https://github.com/github/gh-ost/blob/master/LICENSE
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been copied from gh-ost with no changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! It was copied from gh-ost and there have been changes. I'll append the Vitess copyright.

return true // good to go!
}

// GetSharedUniqueKeys returns the unique keys shared between the two source&target tables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it seems redundant to say "two source&target". Just "source and target"?

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Dec 5, 2021

Addressed all review comments:

  • Copyright notices updated
  • Separate parsing for enum and for set data types
  • Qualifying column names in schema analysis columns
  • Typos & grammar

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@deepthi deepthi merged commit 9cb60cd into vitessio:main Dec 9, 2021
@deepthi deepthi deleted the onlineddl-analyze-revert-ability branch December 9, 2021 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants