-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TI-CDC] Using canal JSON to output to Kafka, different primary key types produce inconsistent results #6269
Comments
I think that's to be expected. You can checkout https://docs.pingcap.com/tidb/v5.0/clustered-indexes. For the You can try CREATE TABLE `int_id_table` (
`int_id` int(11) NOT NULL,
`var1` varchar(255) DEFAULT NULL,
PRIMARY KEY (`int_id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; I think this will be an update event rather than an insert event and a delete event. |
I transferred this issue from |
You mean Flink can't handle update events? |
Flink inserted the updated data, but did not delete the data before the update. |
Got it. I'll test it. |
I tested it. It seems works.
CREATE TABLE topic_test
(
varchar_id varchar(32) primary key,
var2 varchar(255)
) WITH (
'connector' = 'kafka',
'topic' = 'ticdc-test',
'properties.bootstrap.servers' = 'kafka:9092',
'properties.group.id' = 'testGroup',
'scan.startup.mode' = 'earliest-offset',
'format' = 'canal-json',
'canal-json.ignore-parse-errors' = 'true'
);
select *
from topic_test;
Screen.Recording.2022-07-18.at.2.51.57.PM.mov |
I also tested it with: insert into varchar_id_table value ('2','2');
update varchar_id_table set varchar_id = '4' where varchar_id = '2'; This also works well. |
Maybe I don't understand what you mean. Can you explain why you think "update" doesn't work for you? |
When the update does not involve the primary key, it can work normally. flink use the upsert method to overwrite the data. When the primary key changes, upsert uses the after content in the update log, so it only writes the updated data to the downstream without deleting the before data. This is why I expect ticdc to be split into delete events and insert events like int primary keys. |
This is more like the internal mechanism of flink and the behavior of jdbc sink. You can set |
But please be aware of this bug. #6198 |
I try to use changefeed_ config to set the enable old value configuration, but it doesn't seem to work. myconfig
my create command
Is my configuration wrong? |
emmm, sorry, I forgot that when you use the can-json protocol, you have to use the old value. |
At present, I can only avoid the synchronization task of varchar primary key. Although this is caused by the mechanism of Flink, I still expect ticdc to unify the behavior of outputting logs through the adjustment of parameter configuration, which is very useful for users to maintain a consistent experience By the way, I have also tried to use the Ti CDC connector by Flink Chinese community to complete such work, but I found that is not stable enough. I have also mentioned such problems in relevant communities |
Can you try the |
The project has no plan to upgrade tidb for the time being. I'll try it if I have a chance. |
I'll close this issue. If you have any other questions, please feel free to reopen this issue. |
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
1.Deploy tidb and ticdc
2.Create a table with int type as primary key
3.Create another table with varchar type as primary key
4.Create a ticdc synchronization task to output the change logs of the above two tables to kakfa
5.Insert data into the above two tables, and then update their primary keys
2. What did you expect to see? (Required)
The log of primary key update is split into delete log and insert log, which is convenient for downstream processing with flink.
3. What did you see instead (Required)
The int type primary key change log is split, but the varchar type primary key log is not split.
4. What is your TiDB version? (Required)
Release Version: v5.4.0
Edition: Community
Git Commit Hash: 55f3b24c1c9f506bd652ef1d162283541e428872
Git Branch: heads/refs/tags/v5.4.0
UTC Build Time: 2022-01-25 08:39:26
GoVersion: go1.16.4
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
The text was updated successfully, but these errors were encountered: