-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resuming ghostferry-copydb after interrupt fails due to missed TableMapEvent #156
Comments
A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once), meaning that resuming from such a binlog is not possible. This commit tracks the position of the most recently processed TableMapEvent and uses this for resuming (skipping any actual events between the last write and resume position that were already processed. Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once), meaning that resuming from such a binlog is not possible. Resuming at a binlog position between the TableMapEvent and a RowsEvent is not possible, as the binlog streamer would miss the table definition for following DML statements. This commit tracks the position of the most recently processed TableMapEvent and uses this for resuming (skipping any actual events between the last write and resume position that were already processed). Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
I've cleaned up the PR a lot to make it more easily digestable. I'm still working on an integration test and reproducing it reliably in master (and fixed in the branch, obviously) |
A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once), meaning that resuming from such a binlog is not possible. Resuming at a binlog position between the TableMapEvent and a RowsEvent is not possible, as the binlog streamer would miss the table definition for following DML statements. This commit tracks the position of the most recently processed TableMapEvent and uses this for resuming (skipping any actual events between the last write and resume position that were already processed). Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
OK, I've managed to add an integration test that reliably triggers the problem in master and works.. I'm not particularly proud of this test code, to be quite frank, as it relies on winning a race condition, but it seems to be stable enough with the input values (and I've extracted the important parameters into dedicated variables to make it more obvious and added a bunch of documentation). Hopefully this allows demonstrating the problem a bit better. The latest PR is good from my side - looking forward to hearing feedback |
A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once), meaning that resuming from such a binlog is not possible. Resuming at a binlog position between the TableMapEvent and a RowsEvent is not possible, as the binlog streamer would miss the table definition for following DML statements. This commit tracks the position of the most recently processed TableMapEvent and uses this for resuming (skipping any actual events between the last write and resume position that were already processed). Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once), meaning that resuming from such a binlog is not possible. Resuming at a binlog position between the TableMapEvent and a RowsEvent is not possible, as the binlog streamer would miss the table definition for following DML statements. This commit tracks the position of the most recently processed TableMapEvent and uses this for resuming (skipping any actual events between the last write and resume position that were already processed). Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once), meaning that resuming from such a binlog is not possible. Resuming at a binlog position between the TableMapEvent and a RowsEvent is not possible, as the binlog streamer would miss the table definition for following DML statements. This commit tracks the position of the most recently processed TableMapEvent and uses this for resuming (skipping any actual events between the last write and resume position that were already processed). Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once), meaning that resuming from such a binlog is not possible. Resuming at a binlog position between the TableMapEvent and a RowsEvent is not possible, as the binlog streamer would miss the table definition for following DML statements. This commit tracks the position of the most recently processed TableMapEvent and uses this for resuming (skipping any actual events between the last write and resume position that were already processed). Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
This is fixed by #168? |
The interrupt-resume feature as described in
https://shopify.github.io/ghostferry/master/copydbinterruptresume.html
works well, if the interrupt does not happen within batched RowsEvent events being processed.
A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once).
Thus, if the interrupt happens after receiving the TableMapEvent but before receiving the last RowsEvent, ghostferry will try to resume from the last processed RowsEvent, causing the replication/BinlogSyncer syncer to crash with an invalid table id < table-ID >, no corresponding table map event exception.
This is due to the following code:
Note that this is not a bug in the replication module, because ghostferry points the syncer at a resume position after the TableMapEvent , and the code cannot satisfy this (without ignoring the fact that it doesn't know the table ID).
Changing the replication to ignore unknown table IDs is tempting but very tricky: we could cache the table IDs previously seen in ghostferry and "fill the gap" this way. Unfortunately the parsing of the RowsEvent data relies on the table schema. And, conceptually, it also seems quite unclean.
Thus, IMO the correct approach is to make sure the syncer is not pointed at a location from which it cannot resume.
Ghostferry today only stores the resume position via its lastStreamedBinlogPosition property. In my variant of ghostferry (see #26) , I have changed the BinlogStreamer class to keep track of an additional position: the lastResumeBinlogPosition . The idea is to analyze each received event from the syncer and check if it's a RowsEvent or not. If not, the event position is considered "safe to be resumed from" and the property is updated - otherwise it is kept at its current value.
Thus, the resume position is a tuple of "last received/processed event" and "last position we can resume from".
Then, when resuming, we do not resume from lastStreamedBinlogPosition, but instead from lastResumeBinlogPosition. Any event received in BinlogStreamer.Run is first analyzed if the event position is before lastStreamedBinlogPosition , and if so, it is skipped. Otherwise it is queued to the eventListeners .
Happy to share my changes as a pull request - unfortunately it's currently somewhat intermangled with my changes on ghostferry-replicatedb (see other ticket). So before I untangle these two unrelated changes, I wanted to hear your opinions on the above, and if I'm overlooking a simpler way of doing this.
The text was updated successfully, but these errors were encountered: