Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry should not happen when in dedicated connection #6523

Merged
merged 1 commit into from
Sep 1, 2020

Conversation

harshit-gangal
Copy link
Member

We have checked for transaction ID and blocked any retry in wrapped queryservice. We should also do the same for reserved Connection.

Signed-off-by: Harshit Gangal harshit@planetscale.com

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Copy link
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see a use case where this can be a problem? In the case of a transaction, the reason we don't retry is because DMLs are not idempotent. But it may be safe to retry a select in a reserved connection.

I'm thinking the more correct behavior is that we should disallow retries on anything that's non-idempotent, essentially DMLs, etc.

PS: unit_race has found a race.

@harshit-gangal
Copy link
Member Author

Do you see a use case where this can be a problem? In the case of a transaction, the reason we don't retry is because DMLs are not idempotent. But it may be safe to retry a select in a reserved connection.

I'm thinking the more correct behavior is that we should disallow retries on anything that's non-idempotent, essentially DMLs, etc.

PS: unit_race has found a race.

Race is unrelated.

2020-07-31T22:46:14.9756806Z ==================
2020-07-31T22:46:14.9757199Z WARNING: DATA RACE
2020-07-31T22:46:14.9757473Z Write at 0x00c000481580 by goroutine 84:
2020-07-31T22:46:14.9758493Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.(*TabletManager).Start()
2020-07-31T22:46:14.9803089Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init.go:240 +0x377
2020-07-31T22:46:14.9803634Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.TestStartDoesNotUpdateReplicationDataForTabletInWrongShard()
2020-07-31T22:46:14.9834613Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init_test.go:404 +0x384
2020-07-31T22:46:14.9834991Z   testing.tRunner()
2020-07-31T22:46:14.9835313Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:909 +0x199
2020-07-31T22:46:14.9835490Z 
2020-07-31T22:46:14.9835782Z Previous read at 0x00c000481580 by goroutine 89:
2020-07-31T22:46:14.9836138Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.(*TabletManager).rebuildKeyspace.func1()
2020-07-31T22:46:14.9836502Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init.go:422 +0xf8
2020-07-31T22:46:14.9836869Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.(*TabletManager).rebuildKeyspace()
2020-07-31T22:46:14.9837229Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init.go:440 +0x48a
2020-07-31T22:46:14.9837437Z 
2020-07-31T22:46:14.9837701Z Goroutine 84 (running) created at:
2020-07-31T22:46:14.9837939Z   testing.(*T).Run()
2020-07-31T22:46:14.9838264Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:960 +0x651
2020-07-31T22:46:14.9838534Z   testing.runTests.func1()
2020-07-31T22:46:14.9838860Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:1202 +0xa6
2020-07-31T22:46:14.9839117Z   testing.tRunner()
2020-07-31T22:46:14.9839440Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:909 +0x199
2020-07-31T22:46:14.9839696Z   testing.runTests()
2020-07-31T22:46:14.9840015Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:1200 +0x521
2020-07-31T22:46:14.9840277Z   testing.(*M).Run()
2020-07-31T22:46:14.9840602Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:1117 +0x2ff
2020-07-31T22:46:14.9840858Z   main.main()
2020-07-31T22:46:14.9841111Z       _testmain.go:76 +0x223
2020-07-31T22:46:14.9841276Z 
2020-07-31T22:46:14.9841545Z Goroutine 89 (finished) created at:
2020-07-31T22:46:14.9841890Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.(*TabletManager).createKeyspaceShard()
2020-07-31T22:46:14.9842253Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init.go:392 +0x82d
2020-07-31T22:46:14.9842585Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.(*TabletManager).Start()
2020-07-31T22:46:14.9842935Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init.go:253 +0x572
2020-07-31T22:46:14.9843251Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.newTestTM()
2020-07-31T22:46:14.9843605Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init_test.go:423 +0x335
2020-07-31T22:46:14.9844188Z   vitess.io/vitess/go/vt/vttablet/tabletmanager.TestStartDoesNotUpdateReplicationDataForTabletInWrongShard()
2020-07-31T22:46:14.9844558Z       /home/runner/work/vitess/vitess/go/vt/vttablet/tabletmanager/tm_init_test.go:395 +0x12f
2020-07-31T22:46:14.9844891Z   testing.tRunner()
2020-07-31T22:46:14.9845189Z       /opt/hostedtoolcache/go/1.13.14/x64/src/testing/testing.go:909 +0x199
2020-07-31T22:46:14.9845437Z ==================
2020-07-31T22:46:14.9846548Z --- FAIL: TestStartDoesNotUpdateReplicationDataForTabletInWrongShard (0.00s)
2020-07-31T22:46:14.9848608Z ##[error]    testing.go:853: race detected during execution of test
2020-07-31T22:46:14.9874626Z E0731 22:46:14.963842   23359 tm_state.go:300] tablet record was taken over by another process: my address is localhost:1234, but record is owned by tab3:1234
2020-07-31T22:46:14.9875047Z E0731 22:46:14.964315   23359 tm_state.go:326] tablet record was taken over by another process: my address is localhost:1234, but record is owned by tab3:1234
2020-07-31T22:46:14.9875319Z FAIL
2020-07-31T22:46:14.9875604Z FAIL	vitess.io/vitess/go/vt/vttablet/tabletmanager	0.320s

@harshit-gangal
Copy link
Member Author

switch vterrors.Code(err) {
case vtrpcpb.Code_UNAVAILABLE, vtrpcpb.Code_FAILED_PRECONDITION: 
     return true
}
return false

Currently the retry happens for these two error codes.
tabletserver returns Code_UNAVAILABLE for
https://github.com/vitessio/vitess/blob/master/go/vt/vttablet/tabletserver/tabletserver.go#L1288,L1289

case mysql.CRServerGone, mysql.ERServerShutdown:
	errCode = vtrpcpb.Code_UNAVAILABLE

I do not see these as transient errors for dedicated connections like transactional connection and reserved connection, because they will hit the same tablet again.

@systay
Copy link
Collaborator

systay commented Sep 1, 2020

Could we add a test or two around this logic?

Copy link
Collaborator

@systay systay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except maybe missing tests

@harshit-gangal
Copy link
Member Author

there are discovery gateway tests

@harshit-gangal harshit-gangal merged commit 7f5dbdd into vitessio:master Sep 1, 2020
@askdba askdba added this to the v8.0 milestone Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants