Retry should not happen when in dedicated connection #6523

harshit-gangal · 2020-07-31T22:37:47Z

We have checked for transaction ID and blocked any retry in wrapped queryservice. We should also do the same for reserved Connection.

Signed-off-by: Harshit Gangal harshit@planetscale.com

Signed-off-by: Harshit Gangal <harshit@planetscale.com>

sougou

Do you see a use case where this can be a problem? In the case of a transaction, the reason we don't retry is because DMLs are not idempotent. But it may be safe to retry a select in a reserved connection.

I'm thinking the more correct behavior is that we should disallow retries on anything that's non-idempotent, essentially DMLs, etc.

PS: unit_race has found a race.

harshit-gangal · 2020-08-10T10:19:33Z

Do you see a use case where this can be a problem? In the case of a transaction, the reason we don't retry is because DMLs are not idempotent. But it may be safe to retry a select in a reserved connection.

I'm thinking the more correct behavior is that we should disallow retries on anything that's non-idempotent, essentially DMLs, etc.

PS: unit_race has found a race.

Race is unrelated.

2020-07-31T22:46:14.9756806Z ==================
2020-07-31T22:46:14.9757199Z WARNING: DATA RACE
2020-07-31T22:46:14.9757473Z Write at 0x00c000481580 by goroutine 84:
2020-07-31T22:46:14.9758493Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.(*TabletManager).Start()
2020-07-31T22:46:14.9803089Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init.go:240 +0x377
2020-07-31T22:46:14.9803634Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.TestStartDoesNotUpdateReplicationDataForTabletInWrongShard()
2020-07-31T22:46:14.9834613Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init_test.go:404 +0x384
2020-07-31T22:46:14.9834991Z   testing.tRunner()
2020-07-31T22:46:14.9835313Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:909 +0x199
2020-07-31T22:46:14.9835490Z 
2020-07-31T22:46:14.9835782Z Previous read at 0x00c000481580 by goroutine 89:
2020-07-31T22:46:14.9836138Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.(*TabletManager).rebuildKeyspace.func1()
2020-07-31T22:46:14.9836502Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init.go:422 +0xf8
2020-07-31T22:46:14.9836869Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.(*TabletManager).rebuildKeyspace()
2020-07-31T22:46:14.9837229Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init.go:440 +0x48a
2020-07-31T22:46:14.9837437Z 
2020-07-31T22:46:14.9837701Z Goroutine 84 (running) created at:
2020-07-31T22:46:14.9837939Z   testing.(*T).Run()
2020-07-31T22:46:14.9838264Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:960 +0x651
2020-07-31T22:46:14.9838534Z   testing.runTests.func1()
2020-07-31T22:46:14.9838860Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:1202 +0xa6
2020-07-31T22:46:14.9839117Z   testing.tRunner()
2020-07-31T22:46:14.9839440Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:909 +0x199
2020-07-31T22:46:14.9839696Z   testing.runTests()
2020-07-31T22:46:14.9840015Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:1200 +0x521
2020-07-31T22:46:14.9840277Z   testing.(*M).Run()
2020-07-31T22:46:14.9840602Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:1117 +0x2ff
2020-07-31T22:46:14.9840858Z   main.main()
2020-07-31T22:46:14.9841111Z       _testmain.go:76 +0x223
2020-07-31T22:46:14.9841276Z 
2020-07-31T22:46:14.9841545Z Goroutine 89 (finished) created at:
2020-07-31T22:46:14.9841890Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.(*TabletManager).createKeyspaceShard()
2020-07-31T22:46:14.9842253Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init.go:392 +0x82d
2020-07-31T22:46:14.9842585Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.(*TabletManager).Start()
2020-07-31T22:46:14.9842935Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init.go:253 +0x572
2020-07-31T22:46:14.9843251Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.newTestTM()
2020-07-31T22:46:14.9843605Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init_test.go:423 +0x335
2020-07-31T22:46:14.9844188Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.TestStartDoesNotUpdateReplicationDataForTabletInWrongShard()
2020-07-31T22:46:14.9844558Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init_test.go:395 +0x12f
2020-07-31T22:46:14.9844891Z   testing.tRunner()
2020-07-31T22:46:14.9845189Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:909 +0x199
2020-07-31T22:46:14.9845437Z ==================
2020-07-31T22:46:14.9846548Z --- FAIL: TestStartDoesNotUpdateReplicationDataForTabletInWrongShard (0.00s)
2020-07-31T22:46:14.9848608Z ##[error]    testing.go:853: race detected during execution of test
2020-07-31T22:46:14.9874626Z E0731 22:46:14.963842   23359 tm_state.go:300] tablet record was taken over by another process: my address is localhost:1234, but record is owned by tab3:1234
2020-07-31T22:46:14.9875047Z E0731 22:46:14.964315   23359 tm_state.go:326] tablet record was taken over by another process: my address is localhost:1234, but record is owned by tab3:1234
2020-07-31T22:46:14.9875319Z FAIL
2020-07-31T22:46:14.9875604Z FAIL	vitess.io/vitess/go/vt/vttablet/tabletmanager	0.320s

harshit-gangal · 2020-08-12T07:17:10Z

switch vterrors.Code(err) {
case vtrpcpb.Code_UNAVAILABLE, vtrpcpb.Code_FAILED_PRECONDITION: 
     return true
}
return false

Currently the retry happens for these two error codes.
tabletserver returns Code_UNAVAILABLE for
https://github.com/vitessio/vitess/blob/master/go/vt/vttablet/tabletserver/tabletserver.go#L1288,L1289

case mysql.CRServerGone, mysql.ERServerShutdown:
	errCode = vtrpcpb.Code_UNAVAILABLE

I do not see these as transient errors for dedicated connections like transactional connection and reserved connection, because they will hit the same tablet again.

systay · 2020-09-01T06:05:36Z

Could we add a test or two around this logic?

systay

LGTM, except maybe missing tests

harshit-gangal · 2020-09-01T11:16:04Z

there are discovery gateway tests

Retry should not happen when in dedicated connection

92c27e2

Signed-off-by: Harshit Gangal <harshit@planetscale.com>

harshit-gangal added the Backport me! label Jul 31, 2020

harshit-gangal requested review from deepthi and sougou July 31, 2020 22:37

sougou reviewed Aug 2, 2020

View reviewed changes

harshit-gangal removed the Backport me! label Aug 10, 2020

systay approved these changes Sep 1, 2020

View reviewed changes

harshit-gangal merged commit 7f5dbdd into vitessio:master Sep 1, 2020

askdba added this to the v8.0 milestone Oct 5, 2020

setassociative mentioned this pull request Mar 5, 2021

Vitess v8.0 Release branch tinyspeck/vitess#194

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry should not happen when in dedicated connection #6523

Retry should not happen when in dedicated connection #6523

harshit-gangal commented Jul 31, 2020

sougou left a comment

harshit-gangal commented Aug 10, 2020

harshit-gangal commented Aug 12, 2020

systay commented Sep 1, 2020

systay left a comment

harshit-gangal commented Sep 1, 2020

Retry should not happen when in dedicated connection #6523

Retry should not happen when in dedicated connection #6523

Conversation

harshit-gangal commented Jul 31, 2020

sougou left a comment

Choose a reason for hiding this comment

harshit-gangal commented Aug 10, 2020

harshit-gangal commented Aug 12, 2020

systay commented Sep 1, 2020

systay left a comment

Choose a reason for hiding this comment

harshit-gangal commented Sep 1, 2020