syncer: support multiple ddls in single sharding part #177

amyangfei · 2019-06-18T02:56:58Z

What problem does this PR solve?

support running sharding DDL in sequence in each sharding part, such as following

S1: D1 D2                      D3
S2:            D1 D2 D3
S3:                            D1 D2 D3

What is changed and how it works?

Firstly we introduce some new concepts
- sequence sharding: in a shard DDL scenario, each sharding has multiple DDLs in the same sequence. such as described in the above section
- active DDL: in sequence sharding, we can separate shard DDL into two groups based on whether this DDL has been synchronized to downstream. the DDL in the un-synced group with the smallest binlog position is the current active DDL.
- active index: we sort the shard DDL by binlog position in ascending order, the array index of active DDL is the active index.
- ShardingMeta: storage for sequence sharding information, each sharding group has one ShardingMeta and each ShardingMeta belongs to a sharding group (one-to-one correspondence). ShardingMeta stores the active index, global sequence shard DDL and sequence shard DDL for each upstream source table.
Secondly let's see the synchronization strategy
- we have two sharding stage, the same as the original design:
  - sequence read binlog event from the global stream, short for un-sync stage.
  - sequence read binlog event from the sharding re-sync stream, short for re-sync stage.
- for DML
  
  before table checkpoint before active DDL after active DDL
  
  not-sync stage not sync sync not sync
  
  re-sync stage not sync sync not sync
- for DDL
  - we first try to add DDL to ShardingMeta and checks whether it is active DDL with the following steps:
    1. if DDL already exists in source sequence, check whether it is active DDL only
    2. add the DDL into its related source sequence
    3. if it is a new DDL in global sequence, add it into global sequence
    4. check the source sequence is the prefix-sequence of global sequence, if not, return an error
- after adding DDL to ShardingMeta, we have the old fashion to process sharding group
Next we will focus on error handling
- ShardingMeta is stored in memory, and can be persisted into downstream TiDB. The persistent trigger time is the same as flushing checkpoint, we put ShardingMeta persistent SQL and checkpoint update into the same transaction. With this mechanism, we can achieve the following goals:
  - sync-unit exits before the first shard DDL synced (partial shard DDL processed, or shard DDL synced but failed to execute to downstream, or shard DDL is already executed to downstream but FlushCheckpoint failed), then the global checkpoint is not flushed to the first shard DDL position and no ShardingMeta persistent. after DM-worker/sync-unit restarts, ShardingMeta will be re-construct from meta DB.
  - sync-unit exists before shard DDL(not the first in sequence) synced. ShardingMeta stored in downstream meta DB is updated in the last sharding round, and the active index remains the current active index.
  - sync-unit exits after shard DDL synced, the activeIdx of ShardingMeta will move forward to the next one and persist to downstream TiDB. after sync-unit restarts ShardingMeta will be re-construct from meta DB.
What is not support
- different shard sequence. DM will pause task with error and difficult to recover.
- add or delete shard during sequence sharding. If this happens, we can filter all binlog events of the added or deleted shard and let sequence sharding runs successfully. And then try to recover data of the filtered shard.
TODO:
- add more unit tests for sharding_group.go

Check List

Tests

Unit test
Integration test

Code changes

Has exported function/method change
Has interface methods change
Has persistent data change

Side effects

Increased code complexity

Related changes

Need to update the documentation
Need to be included in the release note

codecov · 2019-06-18T03:21:20Z

Codecov Report

Merging #177 into master will decrease coverage by 0.4647%.
The diff coverage is 33.75%.

@@               Coverage Diff               @@
##             master      #177        +/-   ##
===============================================
- Coverage   55.9038%   55.439%   -0.4648%     
===============================================
  Files           122       122                
  Lines         14516     13826       -690     
===============================================
- Hits           8115      7665       -450     
+ Misses         5591      5389       -202     
+ Partials        810       772        -38

amyangfei · 2019-06-18T08:15:12Z

/run-all-tests

amyangfei · 2019-06-19T07:38:15Z

/run-all-tests

amyangfei · 2019-06-19T08:03:23Z

PTAL @GregoryIan @csuzhangxc

syncer/checkpoint.go

syncer/sharding-meta/shardmeta.go

IANTHEREAL · 2019-06-22T09:35:17Z

syncer/sharding-meta/shardmeta.go

+	return false
+}
+
+// NextShardingDDLFirstPos returns the first binlog position of next sharding DDL in sequence


current or next?

next shard DDL, can be also known as active DDL

IANTHEREAL · 2019-06-22T09:41:29Z

syncer/sharding_group.go

+	IsSchemaOnly bool            // whether is a schema (database) only DDL TODO: zxc add schema-level syncing support later
+
+	sourceID string                  // associate dm-worker source ID
+	schema   string                  // schema name, set through task config


storageSchema and storageTable?

syncer/sharding_group.go

IANTHEREAL · 2019-06-22T09:58:27Z

syncer/sharding_group.go

@@ -295,6 +303,8 @@ func (sg *ShardingGroup) UnresolvedTables() [][]string {
 	sg.RLock()
 	defer sg.RUnlock()

+	// TODO: if we have sharding ddl sequence, and partial ddls synced, we treat


Description is not accurate enough

Then, we can't forward the checkpoint of tables?

IANTHEREAL · 2019-06-22T09:59:58Z

syncer/sharding_group.go

@@ -360,18 +397,26 @@ func UnpackTableID(id string) (string, string) {
 type ShardingGroupKeeper struct {
 	sync.RWMutex
 	groups map[string]*ShardingGroup // target table ID -> ShardingGroup
+	cfg    *config.SubTaskConfig
+
+	schema string


IANTHEREAL · 2019-06-22T10:01:47Z

syncer/sharding_group.go

@@ -412,7 +473,9 @@ func (k *ShardingGroupKeeper) ResetGroups() {
 	k.RLock()
 	defer k.RUnlock()
 	for _, group := range k.groups {
+		group.Lock()


put group.Lock into function of group?

tests/sequence_safe_mode/data/db1.increment.sql

tests/sequence_sharding/run.sh

csuzhangxc · 2019-06-27T12:49:52Z

syncer/sharding_group.go

 			sg.sources[source] = false
 		}
 	}

-	return isResolving, sg.remain <= 0, sg.remain, nil
+	return false, sg.remain <= 0, sg.remain, nil


can we support replicating CREATE TABLE statement for a fresh group now?

what is fresh group meant for

the group not exists before the "create table" statement

do we support it before this pr

the logic doesn't change, if create table in a new sharding group, just create a new one directly

dm/syncer/sharding_group.go

Lines 419 to 423 in 2ff99b5

if schemaGroup, ok := k.groups[schemaID]; !ok {

k.groups[schemaID] = NewShardingGroup(k.cfg.SourceID, k.shardMetaSchema, k.shardMetaTable, sourceIDs, meta, true)

} else {

schemaGroup.Merge(sourceIDs)

}

the false (needShardingHandle) in this return will ignore the statement? then no table will be created in the downstream, right?

not test, if so, the old code has this bug too

it seems strange for no tables in one shard. For example

have two upstreams, each upstream has no shard tables, there exists no target table in downstream too. Then if one of upstream creates one shard table and applies sharding handling, send DDLInfo to DM-master. But DM-master has to wait for the other DM-master for this create table DDL. It seems we can't use common shard way for create table

en, seems a bug 😢

syncer/sharding_group.go

syncer/syncer.go

amyangfei · 2019-06-28T08:10:39Z

/run-all-tests

syncer/sharding_group.go

csuzhangxc

LGTM

amyangfei · 2019-06-28T08:28:53Z

/run-all-tests

IANTHEREAL · 2019-06-28T09:12:57Z

/run-all-tests

mahjonp · 2019-06-28T09:42:29Z

/run-all-tests

amyangfei added priority/important Major change, requires approval from ≥2 primary reviewers status/WIP This PR is still work in progress type/enhancement Performance improvement or refactoring labels Jun 18, 2019

amyangfei force-pushed the sharding-ddl-refactor branch 2 times, most recently from a814184 to be7f9d3 Compare June 18, 2019 03:43

syncer: support multiple ddls in single sharding part

1f54bf8

amyangfei force-pushed the sharding-ddl-refactor branch from be7f9d3 to 1f54bf8 Compare June 18, 2019 03:54

add unit test to shard meta

f1f3cac

amyangfei added 5 commits June 18, 2019 16:17

refine test cases

277fb02

support interrupt in sequence sharding

49ce3d6

refine check_safe_mode in integration test

c4a4075

add more unit test

49675e9

remove some unused code

f24226e

amyangfei added status/PTAL This PR is ready for review. Add this label back after committing new changes and removed status/WIP This PR is still work in progress labels Jun 19, 2019

amyangfei added 2 commits June 20, 2019 20:43

Merge branch 'master' into sharding-ddl-refactor

90a87ce

Merge branch 'master' into sharding-ddl-refactor

d3f6a2c

IANTHEREAL reviewed Jun 21, 2019

View reviewed changes

syncer/checkpoint.go Outdated Show resolved Hide resolved

csuzhangxc reviewed Jun 21, 2019

View reviewed changes

syncer/checkpoint.go Outdated Show resolved Hide resolved

csuzhangxc reviewed Jun 21, 2019

View reviewed changes

syncer/sharding-meta/shardmeta.go Show resolved Hide resolved

syncer/sharding-meta/shardmeta.go Outdated Show resolved Hide resolved

IANTHEREAL reviewed Jun 22, 2019

View reviewed changes

syncer/sharding_group.go Outdated Show resolved Hide resolved

IANTHEREAL reviewed Jun 22, 2019

View reviewed changes

address comment, re-construct code

2ff99b5

csuzhangxc reviewed Jun 27, 2019

View reviewed changes

tests/sequence_safe_mode/data/db1.increment.sql Show resolved Hide resolved

csuzhangxc reviewed Jun 27, 2019

View reviewed changes

tests/sequence_sharding/run.sh Outdated Show resolved Hide resolved

csuzhangxc reviewed Jun 27, 2019

View reviewed changes

remove useless script

a00df3f

csuzhangxc reviewed Jun 27, 2019

View reviewed changes

syncer/sharding_group.go Outdated Show resolved Hide resolved

amyangfei added 2 commits June 28, 2019 10:01

Merge branch 'master' into sharding-ddl-refactor

7166f23

address comments

89de140

amyangfei force-pushed the sharding-ddl-refactor branch from 646c787 to 89de140 Compare June 28, 2019 03:45

csuzhangxc reviewed Jun 28, 2019

View reviewed changes

syncer/sharding_group.go Outdated Show resolved Hide resolved

amyangfei added 4 commits June 28, 2019 12:11

address comment

4bcaa5e

refine sharding syncing check for rows event

369283b

fix go check

28d78b0

fix insync check

452be69

amyangfei force-pushed the sharding-ddl-refactor branch from 58d8435 to 452be69 Compare June 28, 2019 05:19

csuzhangxc reviewed Jun 28, 2019

View reviewed changes

amyangfei added 2 commits June 28, 2019 15:29

address comment

6cad815

refine checkpoint save time

70b942e

make a copy from binlog position

eda8730

csuzhangxc reviewed Jun 28, 2019

View reviewed changes

syncer/sharding_group.go Outdated Show resolved Hide resolved

csuzhangxc approved these changes Jun 28, 2019

View reviewed changes

csuzhangxc added status/LGT2 Two reviewers already commented LGTM, ready for merge and removed status/LGT1 One reviewer already commented LGTM labels Jun 28, 2019

amyangfei merged commit 41be755 into pingcap:master Jun 28, 2019

amyangfei deleted the sharding-ddl-refactor branch June 28, 2019 09:50

lichunzhu pushed a commit to lichunzhu/dm that referenced this pull request Apr 6, 2020

syncer: support multiple ddls in single shard (pingcap#177)

8c099e9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

syncer: support multiple ddls in single sharding part #177

syncer: support multiple ddls in single sharding part #177

amyangfei commented Jun 18, 2019 •

edited

Loading

codecov bot commented Jun 18, 2019 •

edited

Loading

amyangfei commented Jun 18, 2019

amyangfei commented Jun 19, 2019

amyangfei commented Jun 19, 2019

IANTHEREAL Jun 22, 2019

amyangfei Jun 22, 2019

IANTHEREAL Jun 22, 2019

IANTHEREAL Jun 22, 2019 •

edited

Loading

csuzhangxc Jun 24, 2019

IANTHEREAL Jun 22, 2019

IANTHEREAL Jun 22, 2019

csuzhangxc Jun 27, 2019

amyangfei Jun 27, 2019

csuzhangxc Jun 27, 2019

amyangfei Jun 27, 2019

amyangfei Jun 27, 2019 •

edited

Loading

csuzhangxc Jun 27, 2019

amyangfei Jun 27, 2019

amyangfei Jun 27, 2019

csuzhangxc Jun 27, 2019

amyangfei commented Jun 28, 2019

csuzhangxc left a comment

amyangfei commented Jun 28, 2019

IANTHEREAL commented Jun 28, 2019

mahjonp commented Jun 28, 2019

	before table checkpoint	before active DDL	after active DDL
not-sync stage	not sync	sync	not sync
re-sync stage	not sync	sync	not sync

	if schemaGroup, ok := k.groups[schemaID]; !ok {
	k.groups[schemaID] = NewShardingGroup(k.cfg.SourceID, k.shardMetaSchema, k.shardMetaTable, sourceIDs, meta, true)
	} else {
	schemaGroup.Merge(sourceIDs)
	}

syncer: support multiple ddls in single sharding part #177

syncer: support multiple ddls in single sharding part #177

Conversation

amyangfei commented Jun 18, 2019 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

codecov bot commented Jun 18, 2019 • edited Loading

Codecov Report

amyangfei commented Jun 18, 2019

amyangfei commented Jun 19, 2019

amyangfei commented Jun 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IANTHEREAL Jun 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyangfei Jun 27, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyangfei commented Jun 28, 2019

csuzhangxc left a comment

Choose a reason for hiding this comment

amyangfei commented Jun 28, 2019

IANTHEREAL commented Jun 28, 2019

mahjonp commented Jun 28, 2019

amyangfei commented Jun 18, 2019 •

edited

Loading

codecov bot commented Jun 18, 2019 •

edited

Loading

IANTHEREAL Jun 22, 2019 •

edited

Loading

amyangfei Jun 27, 2019 •

edited

Loading