Resuming ghostferry-copydb after interrupt fails due to missed TableMapEvent #156

kolbitsch-lastline · 2020-03-06T00:50:29Z

The interrupt-resume feature as described in

https://shopify.github.io/ghostferry/master/copydbinterruptresume.html

works well, if the interrupt does not happen within batched RowsEvent events being processed.

A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once).

Thus, if the interrupt happens after receiving the TableMapEvent but before receiving the last RowsEvent, ghostferry will try to resume from the last processed RowsEvent, causing the replication/BinlogSyncer syncer to crash with an invalid table id < table-ID >, no corresponding table map event exception.

This is due to the following code:

var ok bool
e.Table, ok = e.tables[e.TableID]
if !ok {
	if len(e.tables) > 0 {
		return errors.Errorf("invalid table id %d, no corresponding table map event", e.TableID)
	} else {
		return errMissingTableMapEvent(errors.Errorf("invalid table id %d, no corresponding table map event", e.TableID))
	}
}

Note that this is not a bug in the replication module, because ghostferry points the syncer at a resume position after the TableMapEvent , and the code cannot satisfy this (without ignoring the fact that it doesn't know the table ID).

Changing the replication to ignore unknown table IDs is tempting but very tricky: we could cache the table IDs previously seen in ghostferry and "fill the gap" this way. Unfortunately the parsing of the RowsEvent data relies on the table schema. And, conceptually, it also seems quite unclean.

Thus, IMO the correct approach is to make sure the syncer is not pointed at a location from which it cannot resume.

Ghostferry today only stores the resume position via its lastStreamedBinlogPosition property. In my variant of ghostferry (see #26) , I have changed the BinlogStreamer class to keep track of an additional position: the lastResumeBinlogPosition . The idea is to analyze each received event from the syncer and check if it's a RowsEvent or not. If not, the event position is considered "safe to be resumed from" and the property is updated - otherwise it is kept at its current value.
Thus, the resume position is a tuple of "last received/processed event" and "last position we can resume from".

Then, when resuming, we do not resume from lastStreamedBinlogPosition, but instead from lastResumeBinlogPosition. Any event received in BinlogStreamer.Run is first analyzed if the event position is before lastStreamedBinlogPosition , and if so, it is skipped. Otherwise it is queued to the eventListeners .

Happy to share my changes as a pull request - unfortunately it's currently somewhat intermangled with my changes on ghostferry-replicatedb (see other ticket). So before I untangle these two unrelated changes, I wanted to hear your opinions on the above, and if I'm overlooking a simpler way of doing this.

The text was updated successfully, but these errors were encountered:

A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once), meaning that resuming from such a binlog is not possible. This commit tracks the position of the most recently processed TableMapEvent and uses this for resuming (skipping any actual events between the last write and resume position that were already processed. Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d

A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once), meaning that resuming from such a binlog is not possible. Resuming at a binlog position between the TableMapEvent and a RowsEvent is not possible, as the binlog streamer would miss the table definition for following DML statements. This commit tracks the position of the most recently processed TableMapEvent and uses this for resuming (skipping any actual events between the last write and resume position that were already processed). Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d

kolbitsch-lastline · 2020-03-22T17:26:21Z

I've cleaned up the PR a lot to make it more easily digestable. I'm still working on an integration test and reproducing it reliably in master (and fixed in the branch, obviously)

A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once), meaning that resuming from such a binlog is not possible. Resuming at a binlog position between the TableMapEvent and a RowsEvent is not possible, as the binlog streamer would miss the table definition for following DML statements. This commit tracks the position of the most recently processed TableMapEvent and uses this for resuming (skipping any actual events between the last write and resume position that were already processed). Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d

kolbitsch-lastline · 2020-03-22T19:09:50Z

OK, I've managed to add an integration test that reliably triggers the problem in master and works..

I'm not particularly proud of this test code, to be quite frank, as it relies on winning a race condition, but it seems to be stable enough with the input values (and I've extracted the important parameters into dedicated variables to make it more obvious and added a bunch of documentation).

Hopefully this allows demonstrating the problem a bit better. The latest PR is good from my side - looking forward to hearing feedback

A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once), meaning that resuming from such a binlog is not possible. Resuming at a binlog position between the TableMapEvent and a RowsEvent is not possible, as the binlog streamer would miss the table definition for following DML statements. This commit tracks the position of the most recently processed TableMapEvent and uses this for resuming (skipping any actual events between the last write and resume position that were already processed). Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d

shuhaowu · 2020-06-23T21:27:29Z

This is fixed by #168?

kolbitsch-lastline mentioned this issue Mar 6, 2020

Support for changing schemas #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resuming ghostferry-copydb after interrupt fails due to missed TableMapEvent #156

Resuming ghostferry-copydb after interrupt fails due to missed TableMapEvent #156

kolbitsch-lastline commented Mar 6, 2020

kolbitsch-lastline commented Mar 22, 2020

kolbitsch-lastline commented Mar 22, 2020

shuhaowu commented Jun 23, 2020

Resuming ghostferry-copydb after interrupt fails due to missed TableMapEvent #156

Resuming ghostferry-copydb after interrupt fails due to missed TableMapEvent #156

Comments

kolbitsch-lastline commented Mar 6, 2020

kolbitsch-lastline commented Mar 22, 2020

kolbitsch-lastline commented Mar 22, 2020

shuhaowu commented Jun 23, 2020