bug: JDBC sink lost update #12026

chenzl25 · 2023-09-01T08:47:19Z

Describe the bug

In RisingWave

 create table jdbc_table (id int, v int) ;

  CREATE SINK s_sink FROM jdbc_table WITH (
    connector='jdbc',
    jdbc.url='jdbc:mysql://xxxx',
    table.name='jdbc_table',
    primary_key='id',
    type='upsert'
);

insert into jdbc_table select i, i from generate_series(1, 10000) i;

// Lost update about 5000 rows
update jdbc_table set id = id + 1;

Table definition in MySQL:

 CREATE TABLE `jdbc_table` (
  `id` int NOT NULL,
  `v` int DEFAULT NULL,
  PRIMARY KEY (`id`)
)

Even if we set streaming_parallelism = 1 would still meet this issue because we issue delete + insert for each row partially, instead of delete all before rows and then insert after rows in the sink executor.

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

chenzl25 · 2023-09-01T09:55:02Z

Another case shows that even if we update those non-pk columns (from MySQL sink table's perspective), it could cause data lost.

create table jdbc_table (id int, v int);
create materialized view v as select distinct on(id, v) id, v from jdbc_table;


 CREATE SINK s_sink FROM v WITH (
    connector='jdbc',
    jdbc.url='jdbc:mysql://xxx',
    table.name='jdbc_table',
    primary_key='id',
    type='upsert'
);

insert into jdbc_table select i, i from generate_series(1, 10000) i;



// Data lost
update jdbc_table set v = v + 1;

BugenZhao · 2023-09-04T04:22:59Z

Even if we set streaming_parallelism = 1 would still meet this issue because we issue delete + insert for each row partially, instead of delete all before rows and then insert after rows in the sink executor.

Just FYI, we've banned updates on the primary key column in #8569 in case. However, as we allow users to specify the primary key columns for sinks, that cannot cover the issue here.

st1page · 2023-09-04T05:38:45Z

It is because the stream key is different from the user defined primary key columns for sinks.

stream key: a,b 
sink pk: a

original:
(1,1) -> (1,2)
(1,2) -> (1,3)

mv fragment 1:
delete (1,1) 

mv fragment 2:
insert (1,2)
delete (1,2)

mv fragment 3:
insert (1,3)

merge to sink fragment:
insert (1,3)
insert (1,2)
delete (1,2)
delete (1,1)

A solution is do additional compaction in the sink executor per barrier.

compact all the chanes with the stream key.
sink all the delete events and then sink all insert evernt.
why it is correct and can reorder de event in the second phase? because after compacting with the stream key, the two event with the same used defined sink pk must have different stream key. So the delete event is not to delete the inserted record in our internal streaming SQL semantic.

chenzl25 added the type/bug Something isn't working label Sep 1, 2023

github-actions bot added this to the release-1.2 milestone Sep 1, 2023

tabVersion changed the title ~~bug: JDBC sink lost upadte~~ bug: JDBC sink lost update Sep 4, 2023

st1page mentioned this issue Sep 4, 2023

RFC: Create Sink into Table risingwavelabs/rfcs#52

Merged

fuyufjh modified the milestones: release-1.2, release-1.3 Sep 11, 2023

BugenZhao mentioned this issue Sep 20, 2023

union operator causes kafka upsert sink to produce inconsistent results #12447

Closed

st1page mentioned this issue Sep 26, 2023

feat(sink): handle stream key sink pk mismatch #12458

Merged

8 tasks

st1page closed this as completed in #12458 Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: JDBC sink lost update #12026

bug: JDBC sink lost update #12026

chenzl25 commented Sep 1, 2023

chenzl25 commented Sep 1, 2023

BugenZhao commented Sep 4, 2023

st1page commented Sep 4, 2023

bug: JDBC sink lost update #12026

bug: JDBC sink lost update #12026

Comments

chenzl25 commented Sep 1, 2023

Describe the bug

Error message/log

To Reproduce

Expected behavior

How did you deploy RisingWave?

The version of RisingWave

Additional context

chenzl25 commented Sep 1, 2023

BugenZhao commented Sep 4, 2023

st1page commented Sep 4, 2023