DBZ-7050 Add automatic retry for snapshots #163

twthorn · 2023-10-17T22:37:05Z

Add automatic retry for snapshots.

Automatic Retry

As detailed in the table copy RFC the last PK value in the VGTID is used for resuming partially complete snapshots. We utilize this for our automatic retries and parse this additional information now being sent by Vitess in the latest version (i.e., including fixes like this).

One implementation decision I debated between was switching all of our VGTID parsing logic to instead use the provided protobuf json converters. This would simplify our codebase but also mean the Debezium logic is more tightly coupled with Vitesss/protobuf, so for the latter reason I decided against this.

GTID -> VGTID config

Previous configs had the option of vitess.gtid which was originally added when vitess.shard only allowed for a single shard. With the changes to support multiple shards back in #135, we made GTID into a CSV as well. In order to easily test the automatic retry snapshot behavior, it would be useful to also specify the last PK values in the config. Additionally, GTID csv adds unnecessary complexity to support another format of specifying GTID(s). So to resolve both of these issues, I changed this to simply be a vitess.vgtid config which is a JSON parseable string that we can use to initialize our VGTID for a VStream. Since VGTID is an array of keyspace/shard/gtid/lastpks it has far more versatility for the user to be able to arbitrarily specify the GTID(s) for any shard(s) subset and even PKs to resume snapshots on. This is what is used to test the snapshot resume behavior in the integration tests.

twthorn · 2023-10-18T15:50:37Z

@jpechane Can you review this when you get the chance? Thank you!

HenryCaiHaiying · 2023-10-19T02:07:03Z

src/main/java/io/debezium/connector/vitess/VitessConnectorConfig.java

@@ -464,28 +467,34 @@ public List<String> getShard() {
        return getConfig().getStrings(SHARD, CSV_DELIMITER);
    }

-    public List<String> getGtid() {
+    public String getGtid() {


Should the method name called getVGtid?

HenryCaiHaiying · 2023-10-19T02:10:21Z

src/main/java/io/debezium/connector/vitess/VitessConnectorConfig.java


-    public static final Field GTID = Field.create(VITESS_CONFIG_GROUP_PREFIX + "gtid")
-            .withDisplayName("gtid")
+    public static final Field VGTID = Field.create(VITESS_CONFIG_GROUP_PREFIX + "vgtid")


Will it cause any regression with the config name changes from gtid to vgtid?

Yes, it will. It is necessary to deprecate the existing setting. So both should be available and when the deprecated one is used then WARN should be written to the log.

jpechane

@twthorn Thanks for the PR. I left few minor comments. One question for you. I supposed that incomplete snapshots would be resumed automaically so I'd expect additional data being writtent to or read from offsets. Is is the case?
Also for testing there should be test when connector is started. Then it is stopped in the middle of snapshot and started again. There should be no lost records or duplicates.

jpechane · 2023-10-19T09:28:55Z

pom.xml

@@ -245,6 +245,11 @@
            <type>test-jar</type>
            <scope>test</scope>
        </dependency>
+        <dependency>


Could you please place this dependency version into dependencyManagement and configure it via property?
Also I'd recommend to align the version with the one pulled by grpc client.

jpechane · 2023-10-19T10:13:02Z

src/main/java/io/debezium/connector/vitess/TablePrimaryKeys.java

+
+import binlogdata.Binlogdata;
+
+public class TablePrimaryKeys {


Could you please add JavaDoc to this class?

jpechane · 2023-10-19T10:16:30Z

src/main/java/io/debezium/connector/vitess/VitessConnectorConfig.java

@@ -37,8 +40,7 @@
 */
 public class VitessConnectorConfig extends RelationalDatabaseConnectorConfig {

-    public static final List<String> EMPTY_GTID_LIST = List.of(Vgtid.EMPTY_GTID);
-    public static final List<String> DEFAULT_GTID_LIST = List.of(Vgtid.CURRENT_GTID);
+    public static final String DEFAULT_GTID = Vgtid.CURRENT_GTID;


Should this be moved to Vgtid class?

jpechane · 2023-10-19T10:17:29Z

src/main/java/io/debezium/connector/vitess/VitessConnectorConfig.java


-    public static final Field GTID = Field.create(VITESS_CONFIG_GROUP_PREFIX + "gtid")
-            .withDisplayName("gtid")
+    public static final Field VGTID = Field.create(VITESS_CONFIG_GROUP_PREFIX + "vgtid")


Yes, it will. It is necessary to deprecate the existing setting. So both should be available and when the deprecated one is used then WARN should be written to the log.

twthorn · 2023-10-19T22:09:42Z

I supposed that incomplete snapshots would be resumed automaically so I'd expect additional data being writtent to or read from offsets. Is is the case?

Correct, there will be additional data read/written from offsets. There is now an additional field table_p_ks that will always be read/written to offsets. When there is no table copy phase its value will be an empty list. When there is table copy phase, it includes the last sent primary keys of the table (example). So it varies between marginally more data, and additional nested json array.

Also worth noting, for rolling forward, this works, and I added a test showing this (we can read offsets that lack the field table_p_ks). However, for rolling back, assuming offsets were written with newer code, the user may need to manually overwrite the offset data to remove the table_p_ks field to make it parseable. Jackson errors out on unknown field by default. One option I considered was excluding the table_p_ks field when it's an empty list, but this means the vgtid format will be inconsistent/changing so I opted against it.

HenryCaiHaiying

lgtm

jpechane · 2023-10-20T11:42:31Z

src/test/java/io/debezium/connector/vitess/VitessConnectorIT.java

+
+    @Test
+    public void testSnapshotLargeTable() throws Exception {
+        TestHelper.executeDDL("vitess_create_tables.ddl");


@twthorn Could you please store all the records captured and assert at the end that all are present and there are neither gaps nor duplicates?

jpechane · 2023-10-20T11:43:17Z

@twthorn LGTM. I left one comment requesting a hardening of a a test. Otherwise we are good to go.

…y test

twthorn · 2023-10-23T15:33:32Z

@jpechane Thanks for the changes! Please let me know if there's anything else I can do on the PR

jpechane · 2023-10-24T09:22:28Z

@twthorn Applied, thanks a lot

twthorn added 4 commits October 17, 2023 18:27

DBZ-7050 Add automatic retry for snapshots

e476215

DBZ-7050 Fix formatting

2c4f0fe

DBZ-7050 Fix testCopyTableAndRestart tableInclude test config

c72e9ee

DBZ-7050 Fix style

36254fc

HenryCaiHaiying reviewed Oct 19, 2023

View reviewed changes

jpechane reviewed Oct 19, 2023

View reviewed changes

DBZ-7050 Add more tests, add deprecated config, other fixes

4cd1ae9

DBZ-7050 Fix formatting

2160072

twthorn requested review from jpechane and HenryCaiHaiying October 20, 2023 00:47

HenryCaiHaiying approved these changes Oct 20, 2023

View reviewed changes

jpechane reviewed Oct 20, 2023

View reviewed changes

DBZ-7050 Assert on record primary key values for mid snapshot recover…

5e3ba25

…y test

twthorn requested a review from jpechane October 20, 2023 20:21

DBZ-7050 Additional asserts; rename method

8432077

jpechane merged commit e164d6f into debezium:main Oct 24, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DBZ-7050 Add automatic retry for snapshots #163

DBZ-7050 Add automatic retry for snapshots #163

twthorn commented Oct 17, 2023 •

edited

Loading

twthorn commented Oct 18, 2023

HenryCaiHaiying Oct 19, 2023

HenryCaiHaiying Oct 19, 2023

jpechane Oct 19, 2023

jpechane left a comment

jpechane Oct 19, 2023

jpechane Oct 19, 2023

jpechane Oct 19, 2023

jpechane Oct 19, 2023

twthorn commented Oct 19, 2023

HenryCaiHaiying left a comment

jpechane Oct 20, 2023

jpechane commented Oct 20, 2023

twthorn commented Oct 23, 2023

jpechane commented Oct 24, 2023


		import binlogdata.Binlogdata;

		public class TablePrimaryKeys {

DBZ-7050 Add automatic retry for snapshots #163

DBZ-7050 Add automatic retry for snapshots #163

Conversation

twthorn commented Oct 17, 2023 • edited Loading

Automatic Retry

GTID -> VGTID config

twthorn commented Oct 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpechane left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

twthorn commented Oct 19, 2023

HenryCaiHaiying left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpechane commented Oct 20, 2023

twthorn commented Oct 23, 2023

jpechane commented Oct 24, 2023

twthorn commented Oct 17, 2023 •

edited

Loading