Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debezium heartbeat not identified correctly #41647

Closed
1 task
rodireich opened this issue Jul 11, 2024 · 6 comments · Fixed by #42431
Closed
1 task

Debezium heartbeat not identified correctly #41647

rodireich opened this issue Jul 11, 2024 · 6 comments · Fixed by #42431
Assignees
Labels
area/connectors Connector related issues autoteam cdc connectors/source/postgres needs-triage team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/bug Something isn't working

Comments

@rodireich
Copy link
Contributor

rodireich commented Jul 11, 2024

Connector Name

source-postgres

Connector Version

3.4.24

What step the error happened?

None

Relevant information

This was found in relation to #41622.
The way we identify debezium heartbeat messages and omit them from getting processed as change events is broken.
We essentially identify as a heartbeat incoming debezium events which contain no source node.
The internal format of heartbeats may have changed and after applying the fix in #41622 on an affected workspace, the added logs show a heartbeat packet which we attempt to process as a change event. causing an error/
See example

Relevant log output

{"op":"m","ts_ms":1720724527278,"source":{"version":"2.6.2.Final","connector":"postgresql","name":"xxxx","ts_ms":1720670367584,"snapshot":null,"db":"xxxx","sequence":"[\"7360842577240\",\"7360842578440\"]","ts_us":1720670367584260,"ts_ns":1720670367584260000,"schema":"","table":"","txId":3570146,"lsn":7360842578440,"xmin":null},"message":{"prefix":"datastream","content":"Y2RjIGhlYXJ0YmVhdA=="}}

Note a source exists.
The value of content is base64 encoded "cdc heartbeat".

The correct way to deal with it is probably filtering out "op": "m" events, signifying a debezium message event, as opposed to a change event.

Contribute

  • Yes, I want to contribute
@rodireich
Copy link
Contributor Author

This is the format of heartbeats we parse successfully:

[
    key={"serverName":"postgres"}, value={"ts_ms":1721681211169}, 
    sourceRecord=SourceRecord{
        sourcePartition={server=postgres}, 
        sourceOffset={lsn=1102464446256, 
        txId=3992408, ts_usec=1721681179734934}} 
    ConnectRecord{
        topic='__debezium-heartbeat.postgres', 
        kafkaPartition=0, 
        key=Struct{serverName=postgres}, keySchema=Schema{io.debezium.connector.common.ServerNameKey:STRUCT}, value=Struct{ts_ms=1721681211169}, valueSchema=Schema{io.debezium.connector.common.Heartbeat:STRUCT}, 
        timestamp=null, 
        headers=ConnectHeaders(headers=)}]

value taken from SourceRecord/sourceOffset/lsn

@rodireich
Copy link
Contributor Author

rodireich commented Jul 22, 2024

The cause of difference in format is not clear as a newer postgres 15 on test environment is sending the old format of heartbeats,
While an older postgres 14 is giving us the new format - all with the same debezium 2.6.2

@rodireich
Copy link
Contributor Author

rodireich commented Jul 22, 2024

I think I understand what goes on here:
The unexpected message bearing "op": "m" (m for message type) is not actually a debezium heartbeat but a message that another product inserted to the WAL.
According to the debezium documentation, an op:m message is

A message event signals that a generic logical decoding message has been inserted directly into the WAL typically with the pg_logical_emit_message function

By looking the the contents of the message we see

{
  "prefix":"datastream",
  "content":"Y2RjIGhlYXJ0YmVhdA=="      ←--- "cdc heartbeat" base64 encoded 
}

I verified on the debezium code there was no change in heartbeat structure.
Datastreams is actually a CDC solution on GCP, which makes it likely that the customer is using Datastreams in conjunction, which most likely generates these WAL entires for their own heartbeats.

@rodireich
Copy link
Contributor Author

What we should do is:

  1. Filter out op: m incoming events on debezium as they will never be records change events.
  2. See if we can improve our heartbeat parsing. There is no problem with it .But implementation can be better.

@rodireich rodireich linked a pull request Jul 23, 2024 that will close this issue
2 tasks
@rodireich
Copy link
Contributor Author

The fix is effective with the workspace we've been tracking this issue on.
They're able to sync again

@rodireich
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues autoteam cdc connectors/source/postgres needs-triage team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants