Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc: only backfill tables which experience schema changes #43896

Closed
ajwerner opened this issue Jan 13, 2020 · 6 comments · Fixed by #55135
Closed

cdc: only backfill tables which experience schema changes #43896

ajwerner opened this issue Jan 13, 2020 · 6 comments · Fixed by #55135
Labels
A-cdc Change Data Capture

Comments

@ajwerner
Copy link
Contributor

Describe the problem

When a schema change which changes the logical layout of a table occurs, namely a column addition or removal, we send a backfill of all of the rows in the table. Currently we always do this backfill on schema changes though #31213 is to make that optional. Changefeeds watch multiple tables at a time. The logic to perform a backfill does not distinguish which spans need to be backfilled. When a backfill occurs it will backfill all of the rows from all of the tables in a changefeed.

To Reproduce

CREATE TABLE a (i INT PRIMARY KEY);
CREATE TABLE b (i INT PRIMARY KEY);
INSERT INTO a VALUES (1), (2);
INSERT INTO b VALUES (3), (4);

Now create a changefeed:

EXPERIMENTAL CHANGEFEED FOR a, b WITH updated;

First we'll see the initial values:

b,[3],"{""after"": {""i"": 3}, ""updated"": ""1578881344671904804.0000000000""}"
b,[4],"{""after"": {""i"": 4}, ""updated"": ""1578881344671904804.0000000000""}"
a,[1],"{""after"": {""i"": 1}, ""updated"": ""1578881344671904804.0000000000""}"
a,[2],"{""after"": {""i"": 2}, ""updated"": ""1578881344671904804.0000000000""}"

Now add a new column to a:

ALTER TABLE a ADD COLUMN i2 INT DEFAULT 5

Now we'll see the writes of the backfill (#35738):

a,[1],"{""after"": {""i"": 1}, ""updated"": ""1578881373508597199.0000000000""}"
a,[2],"{""after"": {""i"": 2}, ""updated"": ""1578881373508597199.0000000000""}"

Then after the resolved timestamp for the schema change passes we'll see the backfill of not just a but also of b. This is the issue:

b,[3],"{""after"": {""i"": 3}, ""updated"": ""1578881373522897821.0000000000""}"
b,[4],"{""after"": {""i"": 4}, ""updated"": ""1578881373522897821.0000000000""}"
a,[1],"{""after"": {""i"": 1, ""i2"": 5}, ""updated"": ""1578881373522897821.0000000000""}"
a,[2],"{""after"": {""i"": 2, ""i2"": 5}, ""updated"": ""1578881373522897821.0000000000""}"

Expected behavior

I'd expect to only see the backfill of a.

@ajwerner
Copy link
Contributor Author

ajwerner commented Mar 9, 2020

If we did this then one thing that we might want to do is checkpoint the progress of tables other than the one being backfilled. It might take a new feature in the span frontier to see, hey, has the frontier for any complete table spans moved? Could be good.

@cjireland
Copy link

Hi @ajwerner , any update on this ticket? Thanks!

@HonoreDB
Copy link
Contributor

HonoreDB commented Oct 1, 2020

@cjireland, I've opened a PR to close this; it will likely be merged tomorrow or next week.

@cjireland
Copy link

Lovely, thanks @HonoreDB

In what version do you think it will be released?

@ajwerner
Copy link
Contributor Author

ajwerner commented Oct 2, 2020

I'd think we'd call this a bug in which case it'd be eligible for backport so 20.2.1 and 20.1.x whenever that happens.

@cjireland
Copy link

Thanks @ajwerner

@craig craig bot closed this as completed in #55135 Oct 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants