Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online DDL: followups in multiple trajectories #6901

Merged
merged 16 commits into from
Nov 8, 2020

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented Oct 19, 2020

Adding functionality to online DDL:

  • TabletAlias identity, now persisted in backend _vt.schema_migrations table tablet column, indicating which tablet originated the migration. This will be useful when identifying, and recovering from, mid-migration failovers
  • retries column, counting the number of retries for a specific migration

Updated changes summary in followup comment

… an app by breaking its name to tokens

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…persist original tablet alias that executed the migration. Adding 'retries' counter column

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…the way to retrying a migration after failover

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…-updates

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Oct 22, 2020

This PR begins to accumulate multiple changes, so I'd like to get it reviewed now before it grows too much, and then followup on each of the changes in future PRs. Changes in this PR:

  • Online DDL tracks identify of tablet (as string, e.g. zone1-0000000100) which spawned the migration
    column: tablet
  • Online DDL tracks number of retries for a migration
    column: retries
  • Online DDL tracks whether a migration failed due to tablet failure
    column: tablet_failure
  • Online DDL executor identifies migrations which have failed due to tablet failure. For example, a master failover due to host crash fails the tablet and the migration. Executor identifies this situation and marks the migration both as failed as well as tablet_failure=1
  • Online DDL Executor automatically retries a migration under the following conditions:
    • the migration failed
    • the migration is marked as tablet_failure=1, ie. it was started by a different tablet
    • the migration has not been retried yet (this is a temporary safety latch, to make sure we don't go infinite or too-many retries)
  • Don't overwrites artifacts identities upon retry. Always append new artifacts. Unset cleanup_timestamp upon retry
    all this is done to avoid a situation where we retry a migration before the previously failed migration's artifacts have been cleaned up.
  • Online DDL identifies to throttler using app=... and with the values online-ddl:gh-ost:<uuid> or online-ddl:pt-osc:<uuid>
  • Initial support in throttler to throttling app by token. Such that we can throttle anything that is online-ddl or anything that is gh-ost or of course a specific migration by its UUID

@shlomi-noach shlomi-noach marked this pull request as ready for review October 22, 2020 07:59
@shlomi-noach shlomi-noach changed the title WIP: Online DDL schema updates Online DDL schema updates Oct 22, 2020
@shlomi-noach shlomi-noach requested a review from a team October 22, 2020 08:00
@shlomi-noach shlomi-noach changed the title Online DDL schema updates Online DDL: followups in multiple trajectories Oct 22, 2020
@shlomi-noach
Copy link
Contributor Author

Tracking issue: #6926

@shlomi-noach
Copy link
Contributor Author

This PR will be accompanied with a documentation PR, followup in future comment.

@shlomi-noach
Copy link
Contributor Author

Documentation in vitessio/website#571

Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…-updates

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

Question: this PR introduces a new column (multiple, actually) in schema_migrations table. As part of VExec, vtctld runs an INSERT, and uses a value for the new column. That INSERT gets transported to vttablet which, again, via VExec, executes it on the underlying table.

Now, the issue is what happens if someone upgrades their vtctld but not yet vttablet. Then vttablet sees a query with a new column, which it cannot execute.

I seem to recall we stated that we support partial upgrades.

I wonder how I should go about solving this problem.

@deepthi
Copy link
Member

deepthi commented Nov 2, 2020

Question: this PR introduces a new column (multiple, actually) in schema_migrations table. As part of VExec, vtctld runs an INSERT, and uses a value for the new column. That INSERT gets transported to vttablet which, again, via VExec, executes it on the underlying table.

Now, the issue is what happens if someone upgrades their vtctld but not yet vttablet. Then vttablet sees a query with a new column, which it cannot execute.

I seem to recall we stated that we support partial upgrades.

I wonder how I should go about solving this problem.

I think this pushes us in the direction of version-aware components. vtctld needs to know the version of the vttablets it is talking to and modify its behavior. OTOH we did say that online DDL is experimental in 8.0 so we could document this as an expected incompatibility. Though that just postpones the time when we would have to deal with it, so it might be a good idea to work out a canonical solution right now.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…osc)

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
… and tablet does not expect tablet column in VExec; instead, tablet column is planted into the INSERT query

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

vttablet's VExec receiver is now capable of injecting new columns into insert statements. I've refactored the code to mitigate my concerns in #6901 (comment)

As of now, this PR does not change the query passed between vtctld and vttablet, and vttablet injects the missing column. Fortunately, the value of that column is known to vttablet (indeed, it is a tablet column, so as it stand, vttablet is the best authority for the value of that column).

This does not solve all possible futuristic problems: in the future we may actually want to inject more data on vtctld side, and will have to find a way to solve the Vexec compatibility issue. But for this PR this is enough, and paves the way for more flexibility.

@deepthi
Copy link
Member

deepthi commented Nov 5, 2020

vttablet's VExec receiver is now capable of injecting new columns into insert statements. I've refactored the code to mitigate my concerns in #6901 (comment)

As of now, this PR does not change the query passed between vtctld and vttablet, and vttablet injects the missing column. Fortunately, the value of that column is known to vttablet (indeed, it is a tablet column, so as it stand, vttablet is the best authority for the value of that column).

This does not solve all possible futuristic problems: in the future we may actually want to inject more data on vtctld side, and will have to find a way to solve the Vexec compatibility issue. But for this PR this is enough, and paves the way for more flexibility.

Oh, this is clever 💯

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants