-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Source Postgres: fix CDC OOM issue #5304
Conversation
fixed test
"type": "string", | ||
"description": "A pgoutput logical replication slot.", | ||
"description": "A logical decoding plug-in installed on the PostgreSQL server. Please use `pgoutput` plug-in by default.\nFor more information when to use `wal2json` plug-in read <a href=\"https://docs.airbyte.io/integrations/sources/postgres\">Postgres Source</a> docs.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@irynakruk pgoutput
plug-in is used by default.
For the next sentence you can specify a key objections which are in that article so the description wil be selfcontained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaroslav-hrytsaienko changed to:
A logical decoding plug-in installed on the PostgreSQL server.
pgoutput
plug-in is used by default.\nIf replication table contains a lot of big jsonb values it is recommended to usewal2json
plug-in. For more information aboutwal2json
plug-in read <a href="https://docs.airbyte.io/integrations/sources/postgres\">Postgres Source docs.
/test connector=connectors/source-postgres
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice! Just 1 question
@@ -155,6 +155,9 @@ protected Properties getDebeziumProperties() { | |||
props.setProperty("offset.storage", "org.apache.kafka.connect.storage.FileOffsetBackingStore"); | |||
props.setProperty("offset.storage.file.filename", offsetManager.getOffsetFilePath().toString()); | |||
props.setProperty("offset.flush.interval.ms", "1000"); // todo: make this longer | |||
// default values from debezium CommonConnectorConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default queue size and batch size of the debezium for MySql and MS Sql is 8192 and 2048 accordingly, so we use queue size = 10000 on our side. However, the default values for Postgres is much higher (20240 and 10240) and when this properties are missing on our side, there is much more chances to get OOM. Since those values are default in CommonConnectorConfig, we can use it as a default on out side too.
/test connector=connectors/source-postgres
|
/publish connector=connectors/source-postgres
|
What
CDC OOM issue: table with lots of columns and few of them being JSON columns containing big json blobs cause an error.
How
During investigation it was found out that changing to
wal2json
plugin helps to solve the issue.The solution described below:
wal2json
during source configuration.pgoutput
stays as a default pluginRecommended reading order
PostgresSource.java
PostgresCdcProperties.java
PostgresUtils.java
spec.json
postgres.md
Pre-merge Checklist
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
docs/integrations/<source or destination>/<name>.md
including changelog. See changelog example