Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

syncer: support multiple ddls in single sharding part #177

Merged
merged 40 commits into from
Jun 28, 2019

Conversation

amyangfei
Copy link
Contributor

@amyangfei amyangfei commented Jun 18, 2019

What problem does this PR solve?

support running sharding DDL in sequence in each sharding part, such as following

S1: D1 D2                      D3
S2:            D1 D2 D3
S3:                            D1 D2 D3

What is changed and how it works?

  • Firstly we introduce some new concepts

    • sequence sharding: in a shard DDL scenario, each sharding has multiple DDLs in the same sequence. such as described in the above section
    • active DDL: in sequence sharding, we can separate shard DDL into two groups based on whether this DDL has been synchronized to downstream. the DDL in the un-synced group with the smallest binlog position is the current active DDL.
    • active index: we sort the shard DDL by binlog position in ascending order, the array index of active DDL is the active index.
    • ShardingMeta: storage for sequence sharding information, each sharding group has one ShardingMeta and each ShardingMeta belongs to a sharding group (one-to-one correspondence). ShardingMeta stores the active index, global sequence shard DDL and sequence shard DDL for each upstream source table.
  • Secondly let's see the synchronization strategy

    • we have two sharding stage, the same as the original design:

      • sequence read binlog event from the global stream, short for un-sync stage.
      • sequence read binlog event from the sharding re-sync stream, short for re-sync stage.
    • for DML

      before table checkpoint before active DDL after active DDL
      not-sync stage not sync sync not sync
      re-sync stage not sync sync not sync
    • for DDL

      • we first try to add DDL to ShardingMeta and checks whether it is active DDL with the following steps:
        1. if DDL already exists in source sequence, check whether it is active DDL only
        2. add the DDL into its related source sequence
        3. if it is a new DDL in global sequence, add it into global sequence
        4. check the source sequence is the prefix-sequence of global sequence, if not, return an error
    • after adding DDL to ShardingMeta, we have the old fashion to process sharding group

  • Next we will focus on error handling

    • ShardingMeta is stored in memory, and can be persisted into downstream TiDB. The persistent trigger time is the same as flushing checkpoint, we put ShardingMeta persistent SQL and checkpoint update into the same transaction. With this mechanism, we can achieve the following goals:
      • sync-unit exits before the first shard DDL synced (partial shard DDL processed, or shard DDL synced but failed to execute to downstream, or shard DDL is already executed to downstream but FlushCheckpoint failed), then the global checkpoint is not flushed to the first shard DDL position and no ShardingMeta persistent. after DM-worker/sync-unit restarts, ShardingMeta will be re-construct from meta DB.
      • sync-unit exists before shard DDL(not the first in sequence) synced. ShardingMeta stored in downstream meta DB is updated in the last sharding round, and the active index remains the current active index.
      • sync-unit exits after shard DDL synced, the activeIdx of ShardingMeta will move forward to the next one and persist to downstream TiDB. after sync-unit restarts ShardingMeta will be re-construct from meta DB.
  • What is not support

    • different shard sequence. DM will pause task with error and difficult to recover.
    • add or delete shard during sequence sharding. If this happens, we can filter all binlog events of the added or deleted shard and let sequence sharding runs successfully. And then try to recover data of the filtered shard.
  • TODO:

    • add more unit tests for sharding_group.go

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change
  • Has interface methods change
  • Has persistent data change

Side effects

  • Increased code complexity

Related changes

  • Need to update the documentation
  • Need to be included in the release note

@amyangfei amyangfei added priority/important Major change, requires approval from ≥2 primary reviewers status/WIP This PR is still work in progress type/enhancement Performance improvement or refactoring labels Jun 18, 2019
@codecov
Copy link

codecov bot commented Jun 18, 2019

Codecov Report

Merging #177 into master will decrease coverage by 0.4647%.
The diff coverage is 33.75%.

@@               Coverage Diff               @@
##             master      #177        +/-   ##
===============================================
- Coverage   55.9038%   55.439%   -0.4648%     
===============================================
  Files           122       122                
  Lines         14516     13826       -690     
===============================================
- Hits           8115      7665       -450     
+ Misses         5591      5389       -202     
+ Partials        810       772        -38

@amyangfei amyangfei force-pushed the sharding-ddl-refactor branch 2 times, most recently from a814184 to be7f9d3 Compare June 18, 2019 03:43
@amyangfei
Copy link
Contributor Author

/run-all-tests

@amyangfei
Copy link
Contributor Author

/run-all-tests

@amyangfei amyangfei added status/PTAL This PR is ready for review. Add this label back after committing new changes and removed status/WIP This PR is still work in progress labels Jun 19, 2019
@amyangfei
Copy link
Contributor Author

PTAL @GregoryIan @csuzhangxc

syncer/checkpoint.go Outdated Show resolved Hide resolved
syncer/checkpoint.go Outdated Show resolved Hide resolved
syncer/sharding-meta/shardmeta.go Show resolved Hide resolved
syncer/sharding-meta/shardmeta.go Outdated Show resolved Hide resolved
return false
}

// NextShardingDDLFirstPos returns the first binlog position of next sharding DDL in sequence
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current or next?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next shard DDL, can be also known as active DDL

IsSchemaOnly bool // whether is a schema (database) only DDL TODO: zxc add schema-level syncing support later

sourceID string // associate dm-worker source ID
schema string // schema name, set through task config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storageSchema and storageTable?

@@ -295,6 +303,8 @@ func (sg *ShardingGroup) UnresolvedTables() [][]string {
sg.RLock()
defer sg.RUnlock()

// TODO: if we have sharding ddl sequence, and partial ddls synced, we treat
Copy link
Collaborator

@IANTHEREAL IANTHEREAL Jun 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description is not accurate enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, we can't forward the checkpoint of tables?

@@ -360,18 +397,26 @@ func UnpackTableID(id string) (string, string) {
type ShardingGroupKeeper struct {
sync.RWMutex
groups map[string]*ShardingGroup // target table ID -> ShardingGroup
cfg *config.SubTaskConfig

schema string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -412,7 +473,9 @@ func (k *ShardingGroupKeeper) ResetGroups() {
k.RLock()
defer k.RUnlock()
for _, group := range k.groups {
group.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put group.Lock into function of group?

sg.sources[source] = false
}
}

return isResolving, sg.remain <= 0, sg.remain, nil
return false, sg.remain <= 0, sg.remain, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we support replicating CREATE TABLE statement for a fresh group now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is fresh group meant for

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the group not exists before the "create table" statement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we support it before this pr

Copy link
Contributor Author

@amyangfei amyangfei Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic doesn't change, if create table in a new sharding group, just create a new one directly

dm/syncer/sharding_group.go

Lines 419 to 423 in 2ff99b5

if schemaGroup, ok := k.groups[schemaID]; !ok {
k.groups[schemaID] = NewShardingGroup(k.cfg.SourceID, k.shardMetaSchema, k.shardMetaTable, sourceIDs, meta, true)
} else {
schemaGroup.Merge(sourceIDs)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the false (needShardingHandle) in this return will ignore the statement? then no table will be created in the downstream, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not test, if so, the old code has this bug too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems strange for no tables in one shard. For example

  1. have two upstreams, each upstream has no shard tables, there exists no target table in downstream too. Then if one of upstream creates one shard table and applies sharding handling, send DDLInfo to DM-master. But DM-master has to wait for the other DM-master for this create table DDL. It seems we can't use common shard way for create table

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

en, seems a bug 😢

syncer/syncer.go Outdated Show resolved Hide resolved
syncer/syncer.go Outdated Show resolved Hide resolved
syncer/syncer.go Show resolved Hide resolved
syncer/syncer.go Outdated Show resolved Hide resolved
syncer/syncer.go Show resolved Hide resolved
syncer/syncer.go Outdated Show resolved Hide resolved
@amyangfei
Copy link
Contributor Author

/run-all-tests

Copy link
Member

@csuzhangxc csuzhangxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@csuzhangxc csuzhangxc added status/LGT2 Two reviewers already commented LGTM, ready for merge and removed status/LGT1 One reviewer already commented LGTM labels Jun 28, 2019
@amyangfei
Copy link
Contributor Author

/run-all-tests

2 similar comments
@IANTHEREAL
Copy link
Collaborator

/run-all-tests

@mahjonp
Copy link

mahjonp commented Jun 28, 2019

/run-all-tests

@amyangfei amyangfei merged commit 41be755 into pingcap:master Jun 28, 2019
@amyangfei amyangfei deleted the sharding-ddl-refactor branch June 28, 2019 09:50
lichunzhu pushed a commit to lichunzhu/dm that referenced this pull request Apr 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority/important Major change, requires approval from ≥2 primary reviewers status/LGT2 Two reviewers already commented LGTM, ready for merge type/enhancement Performance improvement or refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants