-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication: Guard against unsafe _vt.vreplication writes #14797
Conversation
Signed-off-by: Matt Lord <mattalord@gmail.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Matt Lord <mattalord@gmail.com>
a683e94
to
0507f64
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
803e571
to
385e9fe
Compare
021b9ab
to
88c6d8c
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
7b4d520
to
914f35b
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumptions related to stream migrations are incorrect here, as commented around those changes.
The fix for reverse streams in traffic_switcher is good.
The change for resharder makes the code clearer , but it is a noop, since there are no other streams on the target at that time.
// everything in the test is shutting down and cleaning up. So we retry a few | ||
// times to deal with that non-problematic and ephemeral issue. | ||
var err error | ||
retries := 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is an issue in CI is just a 3 second retry enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only been seen in the CI AFAIK, and so far 3 retries with a 1 second delay seems to have been enough (we've done essentially the same thing in several places now).
76a6272
to
af1bdc8
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
af1bdc8
to
b67b75f
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
5f8b3e6
to
3c5e4d0
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments, otherwise looks great.
// statements if you want to bypass the safety checks that ensure you are | ||
// being selective. The full comment directive looks like this: | ||
// delete /*vt+ ALLOW_UNSAFE_VREPLICATION_WRITE */ from _vt.vreplication | ||
const AllowUnsafeWriteCommentDirective = "ALLOW_UNSAFE_VREPLICATION_WRITE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
@@ -25,7 +25,7 @@ import ( | |||
|
|||
type iswitcher interface { | |||
lockKeyspace(ctx context.Context, keyspace, action string) (context.Context, func(*error), error) | |||
cancelMigration(ctx context.Context, sm *StreamMigrator) | |||
cancelStreamMigrations(ctx context.Context, sm *StreamMigrator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cancelMigration
is a better name: this is not just cancelling the stream migrations, but also reverting writes, cleaning up reverse replication etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Initially I was just going to rename the one that has a StreamMigrator receiver and got carried away... 😄 I'll change the ones using the switcher interface back as those are more generic.
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
This shows the user how to get around the guardrail if they really know what they're doing. Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
04cde19
to
06bf409
Compare
…on writes (#4038) * cherry pick of 14797 * Address merge conflict Signed-off-by: Matt Lord <mattalord@gmail.com> --------- Signed-off-by: Matt Lord <mattalord@gmail.com> Co-authored-by: Matt Lord <mattalord@gmail.com>
Description
There is no limit on the number of concurrent active workflows you can have in a keyspace. This means that any time we're updating the
_vt.vreplication
table we should uniquely identify a workflow using the id (unique stream for a unique workflow) or workflow name. If we do not, then in various situations we can perform errant steps that can lead to workflow failure or even data corruption.This PR fixes the few places where we were NOT already doing this (actually, this part was extracted to #14826 for backports) — with the exception of the Resharder/StreamMigrator as this is an exception where we DO want to affect multiple workflows (as we move them from the active to inactive shards for a Reshard).
This PR also adds a guardrail for
UPDATE
andDELETE
statements to help prevent Vitess users and developers from accidentally introducing related bugs in the future. For example:Related Issue(s)
Checklist