Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TiCDC Bug Report] lost a table when rename the table in the upstream #3607

Closed
zier-one opened this issue Nov 24, 2021 · 9 comments
Closed

Comments

@zier-one
Copy link
Contributor

zier-one commented Nov 24, 2021

What did you do?

  1. create a changefeed.toml:
[filter]
rules = ['test.tt']
  1. create a changefeed with the changefeed.toml:
    $ cdc cli changefeed create --sink-uri="blackhole://" --config changefeed.toml
  2. create table tt and insert a row in the upstream:
MySQL [test]> create table tt ( id int primary key );
Query OK, 0 rows affected (0.11 sec)

MySQL [test]> insert into tt value (1);
Query OK, 1 row affected (0.01 sec)
  1. create table tt_2 and insert some rows in the upstream:
MySQL [test]> create table tt_2 ( id int primary key );
Query OK, 0 rows affected (0.10 sec)

MySQL [test]> insert into tt_2 value (1);
Query OK, 1 row affected (0.02 sec)

MySQL [test]> insert into tt_2 value (2);
Query OK, 1 row affected (0.00 sec)
  1. add a column for table tt_2 :
MySQL [test]> alter table tt_2 add column a int ;
Query OK, 0 rows affected (0.26 sec)
  1. drop table tt:
MySQL [test]> drop table tt;
Query OK, 0 rows affected (0.24 sec)
  1. rename tt_2 to tt:
MySQL [test]> rename table tt_2 to tt;
Query OK, 0 rows affected (0.09 sec)
  1. insert some rows to the new tt:
MySQL [test]> insert into tt value (3,1);
Query OK, 1 row affected (0.01 sec)

MySQL [test]> insert into tt value (4,1);
Query OK, 1 row affected (0.01 sec)

What did you expect to see?

TiCDC will capture all the changelogs from table tt, whether this table has been renamed or not.

What did you see instead?

The program only captures table tt which have not been renamed.

the BlockHole log:

# grep BlockHoleSink cdc-*/ticdc.log  | grep -v FlushRowChangedEvents | grep -v Checkpoint
cdc-2/ticdc.log:[2021/11/24 17:06:47.590 +08:00] [DEBUG] [black_hole.go:69] ["BlockHoleSink: DDL Event"] [ddl="{\"StartTs\":429324974450016261,\"CommitTs\":429324974450016264,\"TableInfo\":{\"Schema\":\"test\",\"Table\":\"tt\",\"TableID\":45,\"ColumnInfo\":[{\"Name\":\"id\",\"Type\":3}]},\"PreTableInfo\":null,\"Query\":\"create table tt ( id int primary key )\",\"Type\":3}"]
cdc-2/ticdc.log:[2021/11/24 17:06:58.527 +08:00] [DEBUG] [black_hole.go:41] ["BlockHoleSink: EmitRowChangedEvents"] [row="{\"start-ts\":429324977529946113,\"commit-ts\":429324977529946114,\"row-id\":1,\"table\":{\"db-name\":\"test\",\"tbl-name\":\"tt\",\"tbl-id\":45,\"is-partition\":false},\"table-info-version\":429324974450016264,\"replica-id\":0,\"columns\":[{\"name\":\"id\",\"type\":3,\"flag\":11,\"value\":1}],\"pre-columns\":null}"]
cdc-2/ticdc.log:[2021/11/24 17:09:08.735 +08:00] [DEBUG] [black_hole.go:69] ["BlockHoleSink: DDL Event"] [ddl="{\"StartTs\":429325011241664520,\"CommitTs\":429325011241664522,\"TableInfo\":{\"Schema\":\"test\",\"Table\":\"tt_2\",\"TableID\":47,\"ColumnInfo\":[{\"Name\":\"id\",\"Type\":3}]},\"PreTableInfo\":null,\"Query\":\"create table tt_2 ( id int primary key )\",\"Type\":3}"]
cdc-2/ticdc.log:[2021/11/24 17:10:03.820 +08:00] [DEBUG] [black_hole.go:69] ["BlockHoleSink: DDL Event"] [ddl="{\"StartTs\":429325025856192513,\"CommitTs\":429325025856192515,\"TableInfo\":{\"Schema\":\"test\",\"Table\":\"tt_2\",\"TableID\":47,\"ColumnInfo\":[{\"Name\":\"id\",\"Type\":3},{\"Name\":\"a\",\"Type\":3}]},\"PreTableInfo\":{\"Schema\":\"test\",\"Table\":\"tt_2\",\"TableID\":47,\"ColumnInfo\":[{\"Name\":\"id\",\"Type\":3}]},\"Query\":\"alter table tt_2 add column a int\",\"Type\":5}"]
cdc-2/ticdc.log:[2021/11/24 17:10:14.836 +08:00] [DEBUG] [black_hole.go:69] ["BlockHoleSink: DDL Event"] [ddl="{\"StartTs\":429325028687609857,\"CommitTs\":429325028687609859,\"TableInfo\":{\"Schema\":\"test\",\"Table\":\"tt\",\"TableID\":45,\"ColumnInfo\":[{\"Name\":\"id\",\"Type\":3}]},\"PreTableInfo\":{\"Schema\":\"test\",\"Table\":\"tt\",\"TableID\":45,\"ColumnInfo\":[{\"Name\":\"id\",\"Type\":3}]},\"Query\":\"drop table tt\",\"Type\":4}"]
cdc-2/ticdc.log:[2021/11/24 17:10:24.592 +08:00] [DEBUG] [black_hole.go:69] ["BlockHoleSink: DDL Event"] [ddl="{\"StartTs\":429325031479181318,\"CommitTs\":429325031479181319,\"TableInfo\":{\"Schema\":\"test\",\"Table\":\"tt\",\"TableID\":47,\"ColumnInfo\":[{\"Name\":\"id\",\"Type\":3},{\"Name\":\"a\",\"Type\":3}]},\"PreTableInfo\":{\"Schema\":\"test\",\"Table\":\"tt_2\",\"TableID\":47,\"ColumnInfo\":[{\"Name\":\"id\",\"Type\":3},{\"Name\":\"a\",\"Type\":3}]},\"Query\":\"rename table tt_2 to tt\",\"Type\":14}"]

and there is another question:
why the DDLs from tt_2 are inputted to the BlackHole sink? they should be filtered.

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

v4.0.14

TiCDC version (execute cdc version):

v4.0.14
@zier-one zier-one added type/bug The issue is confirmed as a bug. area/ticdc Issues or PRs related to TiCDC. labels Nov 24, 2021
@dbakit
Copy link

dbakit commented Dec 10, 2021

The version v5.1.0 still has this bug...

@dbakit
Copy link

dbakit commented Dec 10, 2021

the reason could be found in cdc/entry/schema_storage.go: schemaSnapshot.handleDDL, kv change logs from TiKV to capturer, are handled by table_id, not the table_name, so in this case, the new table tt_2 never be captured, and no wonder the subsequent changes log not sync to downstream

	case timodel.ActionRenameTable:
		// first drop the table
		err := s.dropTable(job.TableID)
		if err != nil {
			return errors.Trace(err)
		}
		// create table
		err = s.createTable(getWrapTableInfo(job))
		if err != nil {
			return errors.Trace(err)
		}

What's more, if you execute statement rename table tt to tt_bak, this statement will be ignore by TiCDC, and would not sync to downstrem. You can find logs in cdc.log:

 [INFO] [mysql.go:179] ["DDL event ignored"] [query="rename table tt to tt_bak"] [startTs=429685626626113538] [commitTs=429685626626113548]

Because TiCDC uses new table name tt_bak to match the filter rulers. So in order to backup old data in table, you need to set the rulers contains all backup table name, original table name, tmp table name. So your config should be [rules = ['test.tt', 'test.tt_bak', 'test.tt2'].

func (s *mysqlSink) EmitDDLEvent(ctx context.Context, ddl *model.DDLEvent) error {
	if s.filter.ShouldIgnoreDDLEvent(ddl.StartTs, ddl.Type, ddl.TableInfo.Schema, ddl.TableInfo.Table) {
		log.Info(
			"DDL event ignored",
			zap.String("query", ddl.Query),
			zap.Uint64("startTs", ddl.StartTs),
			zap.Uint64("commitTs", ddl.CommitTs),
		)
		return cerror.ErrDDLEventIgnored.GenWithStackByArgs()
	}
	s.statistics.AddDDLCount()
	err := s.execDDLWithMaxRetries(ctx, ddl)
	return errors.Trace(err)
}

@hicqu
Copy link
Contributor

hicqu commented Feb 7, 2022

I have tested 3 situations:

rename upstream tt to tt1 with filter.rules = ["test.tt"]

What happens:

  1. The DDL is ignored by CDC so that the downstream table keeps being tt instead of tt1;
  2. Subsequent DMLs on upstream tt1 are not synchronized to the downstream table.

rename upstream tt1 to tt with filter.rules = ["test.tt"], and tt1 also exists in the downstream

What happens:

  1. The DDL is synchronized to the sink so that both tt1s in upstream and downstream are renamed to tt;
  2. Subsequent DMSs on upstream tt are synchronized to the downstream table.

rename upstream tt1 to tt with filter.rules = ["test.tt"], but tt1 doesn't exist in the downstream

  1. The DDL is synchronized to the sink but fails because there is no tt1 in the downstream database;
  2. Subsequent DMLs on upstream tt can't be synchronized to downstream.

@leoppro what do you think for the behaviors?

@hicqu
Copy link
Contributor

hicqu commented Feb 11, 2022

I have re-tested rename table tt1 to tt with CDC v4.0.14 and 5.3. Here is the result:

  • for v4.0.14, subsequent DMLs can't be synchronized to downstream, it's a bug.
  • for 5.3, subsequent DMLs can be synchronized to downstream correctly.

@nongfushanquan
Copy link
Contributor

what's about 5.2.4 and 5.1.4?

@nongfushanquan
Copy link
Contributor

/unassign @hicqu
/assign @asddongmen

@asddongmen
Copy link
Contributor

asddongmen commented Sep 2, 2022

After #6446 be merged, rename table 'tt_2' to 'tt' will encounter an error and the changefeed will enter error state.
TiCDC refuses to do such thing to prevent potential data loss.

@nongfushanquan
Copy link
Contributor

/close

@ti-chi-bot
Copy link
Member

@nongfushanquan: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants