Skip to content

Conversation

@danielcweeks
Copy link
Contributor

Kafka connect events for data written currently include the name of the target table but don't capture the UUID. Since the coordinator loads the table at time of commit and then processes the events, this can result in a number of issues if the table was dropped/moved and recreated including:

  • mismatched data file schema (field ids/types don't match the new table)
  • path mismatches where referenced data files are outside of the new table location
  • other potential metadata mismatches like referenced data files that no longer exist or invalid row ids

This PR validates that the UUID of the table used when constructing the writer is consistent with the target table UUID at commit time.

Comment on lines 236 to 237

if (expectedUuid != null && !expectedUuid.equals(payloadTableUuid)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of skipping should we fail instead, otherwise can it lead to data loss as the offsets will committed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the difficult things with KC and errors like this. If we fail, then you have events in the control topic, which will remain. The only options at that point are to clear the control topic or move the offsets forward. However, doing that can result in losing commit data for other tables since events for multiple tables are all intermixed.

I don't think there's really a safe recovery and if the data needs to be recovered, you would create a new consumer group for the control topic and reset the consumer offset back to prior to these events.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to skipping, we also skip if the table name is not found. I feel we could move this check up, to right after the table load and verify that the UUID matches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryanck has a good point here. The envelopes are already collected by TableReference before reaching this point. All we need is to ensure that the latest loaded table is consistent with the table reference used to partition the messages.

Comment on lines +212 to +216
LOG.warn(
"Skipping commits to table {} due to target table mismatch. Expected: {} Received: {}",
tableIdentifier,
table.uuid(),
tableReference.uuid());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to return here ? seems like we are just logging ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, we need to return here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a PR to address these two issues, I approved a little too soon.

return;
}

if (!Objects.equals(table.uuid(), tableReference.uuid())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder the case where lets say some one does a library upgrade and it has events without the uuid in the tableReference, what would happen in this case ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The events will be skipped. We should probably only check the UUID if the ref UUID is not null, even though that isn't ideal.

@danielcweeks danielcweeks merged commit d85f8a8 into apache:main Jan 9, 2026
14 checks passed
@bryanck
Copy link
Contributor

bryanck commented Jan 9, 2026

I opened #15011 to fix a couple of things @singhpk234 pointed out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants