VReplication: Guard against unsafe _vt.vreplication writes #14797

mattlord · 2023-12-17T17:07:08Z

Description

There is no limit on the number of concurrent active workflows you can have in a keyspace. This means that any time we're updating the _vt.vreplication table we should uniquely identify a workflow using the id (unique stream for a unique workflow) or workflow name. If we do not, then in various situations we can perform errant steps that can lead to workflow failure or even data corruption.

This PR fixes the few places where we were NOT already doing this (actually, this part was extracted to #14826 for backports) — with the exception of the Resharder/StreamMigrator as this is an exception where we DO want to affect multiple workflows (as we move them from the active to inactive shards for a Reshard).

This PR also adds a guardrail for UPDATE and DELETE statements to help prevent Vitess users and developers from accidentally introducing related bugs in the future. For example:

❯ vtctlclient VReplicationExec zone1-101 "delete from _vt.vreplicati" 2>/dev/null

WARNING: VReplicationExec is deprecated and will be removed in a future release. Please use 'Workflow -- <keyspace.workflow> <action>' instead.

VReplicationExec Error: rpc error: code = Unknown desc = TabletManager.VReplicationExec on zone1-0000000101: invalid table name: vreplicati

❯ vtctlclient VReplicationExec zone1-101 "delete from _vt.vreplication where state != 'Running'" 2>/dev/null

WARNING: VReplicationExec is deprecated and will be removed in a future release. Please use 'Workflow -- <keyspace.workflow> <action>' instead.

VReplicationExec Error: rpc error: code = Unknown desc = TabletManager.VReplicationExec on zone1-0000000100: unsafe WHERE clause in delete without the /*vt+ ALLOW_UNSAFE_VREPLICATION_WRITE */ comment directive:  where state != 'Running'; should be using = or in with at least one of the following columns: id, workflow

Related Issue(s)

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation was added or is not required

Signed-off-by: Matt Lord <mattalord@gmail.com>

vitess-bot · 2023-12-17T17:07:11Z

Signed-off-by: Matt Lord <mattalord@gmail.com>

rohit-nayak-ps

The assumptions related to stream migrations are incorrect here, as commented around those changes.

The fix for reverse streams in traffic_switcher is good.

The change for resharder makes the code clearer , but it is a noop, since there are no other streams on the target at that time.

rohit-nayak-ps · 2023-12-19T19:28:08Z

go/test/endtoend/vreplication/cluster_test.go

+	// everything in the test is shutting down and cleaning up. So we retry a few
+	// times to deal with that non-problematic and ephemeral issue.
+	var err error
+	retries := 3


If this is an issue in CI is just a 3 second retry enough?

It's only been seen in the CI AFAIK, and so far 3 retries with a 1 second delay seems to have been enough (we've done essentially the same thing in several places now).

go/vt/vtctl/workflow/stream_migrator.go

Signed-off-by: Matt Lord <mattalord@gmail.com>

rohit-nayak-ps

A couple of comments, otherwise looks great.

rohit-nayak-ps · 2023-12-21T10:52:41Z

go/vt/vttablet/tabletmanager/vreplication/controller_plan.go

+// statements if you want to bypass the safety checks that ensure you are
+// being selective. The full comment directive looks like this:
+// delete /*vt+ ALLOW_UNSAFE_VREPLICATION_WRITE */ from _vt.vreplication
+const AllowUnsafeWriteCommentDirective = "ALLOW_UNSAFE_VREPLICATION_WRITE"


go/vt/wrangler/vexec.go

rohit-nayak-ps · 2023-12-21T11:40:55Z

go/vt/vtctl/workflow/switcher_interface.go

@@ -25,7 +25,7 @@ import (

 type iswitcher interface {
 	lockKeyspace(ctx context.Context, keyspace, action string) (context.Context, func(*error), error)
-	cancelMigration(ctx context.Context, sm *StreamMigrator)
+	cancelStreamMigrations(ctx context.Context, sm *StreamMigrator)


cancelMigration is a better name: this is not just cancelling the stream migrations, but also reverting writes, cleaning up reverse replication etc.

Good point. Initially I was just going to rename the one that has a StreamMigrator receiver and got carried away... 😄 I'll change the ones using the switcher interface back as those are more generic.

Signed-off-by: Matt Lord <mattalord@gmail.com>

This shows the user how to get around the guardrail if they really know what they're doing. Signed-off-by: Matt Lord <mattalord@gmail.com>

Signed-off-by: Matt Lord <mattalord@gmail.com>

…on writes (#4038) * cherry pick of 14797 * Address merge conflict Signed-off-by: Matt Lord <mattalord@gmail.com> --------- Signed-off-by: Matt Lord <mattalord@gmail.com> Co-authored-by: Matt Lord <mattalord@gmail.com>

Remove _vt.vreplication updates w/o id or workflow name

c1c2b15

Signed-off-by: Matt Lord <mattalord@gmail.com>

mattlord changed the title ~~VReplication: Remove _vt.vreplication updates w/o id or workflow name~~ VReplication: Remove vreplication updates w/o id or workflow name Dec 17, 2023

mattlord changed the title ~~VReplication: Remove vreplication updates w/o id or workflow name~~ VReplication: Remove _vt.vreplication updates w/o id or workflow name Dec 17, 2023

github-actions bot added this to the v19.0.0 milestone Dec 17, 2023

mattlord added Type: Bug Component: VReplication Backport to: release-16.0 and removed NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request labels Dec 17, 2023

Fix unit tests

0507f64

Signed-off-by: Matt Lord <mattalord@gmail.com>

mattlord force-pushed the vrepl_uniq_update branch from a683e94 to 0507f64 Compare December 17, 2023 17:38

This was referenced Dec 18, 2023

Release of v18.0.2 #14754

Open

Release of v17.0.5 #14774

Open

Release of v16.0.7 #14775

Open

Replace one more unsafe predicate usage and fix unit tests

385e9fe

Signed-off-by: Matt Lord <mattalord@gmail.com>

mattlord force-pushed the vrepl_uniq_update branch from 803e571 to 385e9fe Compare December 18, 2023 16:28

mattlord removed NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work labels Dec 18, 2023

mattlord force-pushed the vrepl_uniq_update branch from 021b9ab to 88c6d8c Compare December 18, 2023 17:17

Correct remaining spots

eefef60

Signed-off-by: Matt Lord <mattalord@gmail.com>

mattlord force-pushed the vrepl_uniq_update branch 2 times, most recently from 7b4d520 to 914f35b Compare December 18, 2023 21:24

Prevent non-selective INSERT/UPDATE on vreplication table

2296b43

Signed-off-by: Matt Lord <mattalord@gmail.com>

rohit-nayak-ps requested changes Dec 19, 2023

View reviewed changes

mattlord mentioned this pull request Dec 19, 2023

VReplication: Update singular workflow in traffic switcher #14826

Merged

5 tasks

mattlord removed Backport to: release-16.0 labels Dec 19, 2023

mattlord force-pushed the vrepl_uniq_update branch 3 times, most recently from 76a6272 to af1bdc8 Compare December 20, 2023 01:27

Correct / revert changes specific to Reshard

b67b75f

Signed-off-by: Matt Lord <mattalord@gmail.com>

mattlord force-pushed the vrepl_uniq_update branch from af1bdc8 to b67b75f Compare December 20, 2023 01:50

mattlord changed the title ~~VReplication: Remove _vt.vreplication writes w/o id or workflow name~~ VReplication: Guard against _vt.vreplication writes w/o id or workflow name Dec 20, 2023

Move to a comment directive safety override

2f01db1

Signed-off-by: Matt Lord <mattalord@gmail.com>

mattlord changed the title ~~VReplication: Guard against _vt.vreplication writes w/o id or workflow name~~ VReplication: Guard against unsafe _vt.vreplication writes Dec 20, 2023

Minor changes after self review

3c5e4d0

Signed-off-by: Matt Lord <mattalord@gmail.com>

mattlord force-pushed the vrepl_uniq_update branch from 5f8b3e6 to 3c5e4d0 Compare December 20, 2023 06:51

mattlord requested a review from harshit-gangal December 20, 2023 06:54

mattlord added 2 commits December 20, 2023 02:12

Merge remote-tracking branch 'origin/main' into vrepl_uniq_update

e3a4fcf

Signed-off-by: Matt Lord <mattalord@gmail.com>

Comment nit

808308a

Signed-off-by: Matt Lord <mattalord@gmail.com>

rohit-nayak-ps reviewed Dec 21, 2023

View reviewed changes

mattlord added 4 commits December 21, 2023 20:34

Undo switcher interface func name changes

352d6be

Signed-off-by: Matt Lord <mattalord@gmail.com>

Address review comment

d8e7dbb

Signed-off-by: Matt Lord <mattalord@gmail.com>

Add comment directive to error message

0cf48fc

This shows the user how to get around the guardrail if they really know what they're doing. Signed-off-by: Matt Lord <mattalord@gmail.com>

Update err msg in unit tests

06bf409

Signed-off-by: Matt Lord <mattalord@gmail.com>

mattlord force-pushed the vrepl_uniq_update branch from 04cde19 to 06bf409 Compare December 22, 2023 03:14

rohit-nayak-ps approved these changes Dec 26, 2023

View reviewed changes

mattlord merged commit ab37170 into vitessio:main Dec 27, 2023
104 checks passed

mattlord deleted the vrepl_uniq_update branch December 27, 2023 01:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VReplication: Guard against unsafe _vt.vreplication writes #14797

VReplication: Guard against unsafe _vt.vreplication writes #14797

mattlord commented Dec 17, 2023 •

edited

Loading

vitess-bot bot commented Dec 17, 2023

rohit-nayak-ps left a comment

rohit-nayak-ps Dec 19, 2023

mattlord Dec 20, 2023

rohit-nayak-ps left a comment

rohit-nayak-ps Dec 21, 2023

rohit-nayak-ps Dec 21, 2023

mattlord Dec 22, 2023

VReplication: Guard against unsafe _vt.vreplication writes #14797

VReplication: Guard against unsafe _vt.vreplication writes #14797

Conversation

mattlord commented Dec 17, 2023 • edited Loading

Description

Related Issue(s)

Checklist

vitess-bot bot commented Dec 17, 2023

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

rohit-nayak-ps left a comment

Choose a reason for hiding this comment

rohit-nayak-ps Dec 19, 2023

Choose a reason for hiding this comment

mattlord Dec 20, 2023

Choose a reason for hiding this comment

rohit-nayak-ps left a comment

Choose a reason for hiding this comment

rohit-nayak-ps Dec 21, 2023

Choose a reason for hiding this comment

rohit-nayak-ps Dec 21, 2023

Choose a reason for hiding this comment

mattlord Dec 22, 2023

Choose a reason for hiding this comment

mattlord commented Dec 17, 2023 •

edited

Loading