kv/client: Fix incorrect behaviors when reconnecting the stream #499
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
This PR fixes some bugs that the kv client didn't handle TiKV reconnecting correctly.
Bug 1:
dispatchRequests
, it sends a request to a region, and the region info is added to thependingRegions
map.pendingRegions
map with a new requestID, and tries to re-establish the stream.receiveFromStream
goroutine noticed the stream is broken, and closes all pending regions on this stream. In this case, the new pending region info will be regarded as a dead request and will be closed. However, thedispatchRequests
goroutine is just trying to reconnect and send a new request to that region.5.1 If the reconnecting succeeded and received an event from that region, no pending region info will be found and the error
"received an event but neither pending region nor running region was found"
will be thrown.5.2 If it didn't finish reconnecting and receive an Event so quickly,
receiveFromStream
will send an error toerrCh
when closing the pending regions, and then the region's range will be unlocked, and then locked again and send another request. So the region may be requested twice, and anDuplicatedRequest
error will occur.Bug 2:
When a region meets error when trying to establish the stream, it's region info may be left in the
pendingRegions
map when it retries. and as a result the region may be double-cancelled, which may causeunlocking an not locked range
panic.What is changed and how it works?
Bug 1:
When reconnecting, we deleting the stream from the stream map as well as deleting the
pendingRegions
from thestorePendingRegions
map (indexed by store addr). After deleting it, the next time an request is sent to this store, a new map will be created for the new reconnected stream.Bug 2:
On establishing stream failed, delete the region info from
pendingRegions
Check List
Tests
Related changes