-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
after filtered some DDL event and manually fix downstream, tracker can't track table structure #5272
Comments
Root cause: (I only look at the source code, didn't check the actual behaviour) For the first time when error happens, it's downstream error "Error 1054: Unknown column...". For this time genSQL is succeeded and DML job is added to queue, the TableInfo in memory table checkpoint is filled with downstream table structure. After error happens, in checkpoint.Rollback memory checkpoint is rollbacked to flushed checkpoint which has nil TableInfo, and schema tracker resets the table structure When task is resumed (or auto resumed), table checkpoint and schema tracker doesn't contains the TableInfo so we will use downstream table structure. But at this time, the first step is schema tracker loaded the downstream table structure, and soon we failed at genSQL for the error "Column count doesn't match value count". Note that at this time we didn't save TableInfo to memory table checkpoint, but the table checkpoint still exists because it's created when the first error happens and didn't get dropped by DROP TABLE. Then in checkpoint.Rollback because memory table checkpoint has nil TableInfo, schema tracker didn't reset, and also in following logic schema tracker didn't drop the table since the memory table checkpoint exists. To me, this is caused by TableInfo in schema tracker is not consistent with memory table checkpoint. We can fix it when refine the code. |
cannot reproduce in current master, and there is another bug: after auto-resume on first error, the dml is skipped too, and global point is larger than table point:
|
if you checkout the test part of my pr, it's expected to fail and please upload the log for above case. if the table is skipped, its table checkpoint may not be updated. but dml should not be lost. |
|
/assign gmhdbjd |
/unassign lance6716 |
fixed by #5273 |
What did you do?
task
test.test1
in upstreamalter table test1 add column c4 int;
in upstreamcreate table test2 (c int primary key);
in upstream or wait 30s, to flush checkpointsinsert into test1
. Now task will report error because downstream doesn't have columnc4
alter table test1 add column c4 int;
in downstreamWhat did you expect to see?
task goes on
What did you see instead?
gen insert sqls failed, sourceTable:
test
.test1
, targetTable:test
.test1
: Column count doesn't match value count: 3 (columns) vs 4 (values)",Versions of the cluster
DM version (run
dmctl -V
ordm-worker -V
ordm-master -V
):at least v5.4.0
current status of DM cluster (execute
query-status <task-name>
in dmctl)(paste current status of DM cluster here)
The text was updated successfully, but these errors were encountered: