Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/client: Fix incorrect behaviors when reconnecting the stream #499

Merged
merged 3 commits into from
Apr 27, 2020

Conversation

MyonKeminta
Copy link
Contributor

@MyonKeminta MyonKeminta commented Apr 26, 2020

What problem does this PR solve?

This PR fixes some bugs that the kv client didn't handle TiKV reconnecting correctly.

Bug 1:

  1. In dispatchRequests, it sends a request to a region, and the region info is added to the pendingRegions map.
  2. Due to some reason like unstable network or TiKV crash, it failed to send the request, and tries to send it again.
  3. It adds the new pending region info to the pendingRegions map with a new requestID, and tries to re-establish the stream.
  4. The receiveFromStream goroutine noticed the stream is broken, and closes all pending regions on this stream. In this case, the new pending region info will be regarded as a dead request and will be closed. However, the dispatchRequests goroutine is just trying to reconnect and send a new request to that region.
    5.1 If the reconnecting succeeded and received an event from that region, no pending region info will be found and the error "received an event but neither pending region nor running region was found" will be thrown.
    5.2 If it didn't finish reconnecting and receive an Event so quickly, receiveFromStream will send an error to errCh when closing the pending regions, and then the region's range will be unlocked, and then locked again and send another request. So the region may be requested twice, and an DuplicatedRequest error will occur.

Bug 2:
When a region meets error when trying to establish the stream, it's region info may be left in the pendingRegions map when it retries. and as a result the region may be double-cancelled, which may cause unlocking an not locked range panic.

What is changed and how it works?

Bug 1:
When reconnecting, we deleting the stream from the stream map as well as deleting the pendingRegions from the storePendingRegions map (indexed by store addr). After deleting it, the next time an request is sent to this store, a new map will be created for the new reconnected stream.

Bug 2:
On establishing stream failed, delete the region info from pendingRegions

Check List

Tests

  • Manual test (add detailed scripts or steps below)

Related changes

@MyonKeminta MyonKeminta added type/bugfix This PR fixes a bug. component/kv-client TiKV kv log client component. labels Apr 26, 2020
@MyonKeminta MyonKeminta changed the title kv/client: Fix pending region incorrectly stopped when reconnecting the stream kv/client: Fix incorrect behavior when reconnecting the stream Apr 26, 2020
@MyonKeminta MyonKeminta changed the title kv/client: Fix incorrect behavior when reconnecting the stream kv/client: Fix incorrect behaviors when reconnecting the stream Apr 26, 2020
Copy link
Contributor

@amyangfei amyangfei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amyangfei
Copy link
Contributor

/run-integration-tests

@amyangfei amyangfei added the LGT1 label Apr 26, 2020
@MyonKeminta
Copy link
Contributor Author

/run-integration-tests

Copy link
Contributor

@zier-one zier-one left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zier-one zier-one added LGT2 and removed LGT1 labels Apr 27, 2020
@zier-one zier-one merged commit a6ddaa4 into pingcap:master Apr 27, 2020
5kbpers pushed a commit to 5kbpers/ticdc that referenced this pull request Aug 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/kv-client TiKV kv log client component. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants