Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Postgres: fix CDC OOM issue #5304

Merged
merged 10 commits into from
Aug 19, 2021
Merged

Conversation

irynakruk
Copy link
Contributor

@irynakruk irynakruk commented Aug 10, 2021

What

CDC OOM issue: table with lots of columns and few of them being JSON columns containing big json blobs cause an error.

How

During investigation it was found out that changing to wal2json plugin helps to solve the issue.
The solution described below:

  1. Add possibility to select wal2json during source configuration.
  2. pgoutput stays as a default plugin

Recommended reading order

  1. PostgresSource.java
  2. PostgresCdcProperties.java
  3. PostgresUtils.java
  4. spec.json
  5. postgres.md
  6. tests

Pre-merge Checklist

Updating a connector

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions
  • Connector version bumped like described here

@irynakruk irynakruk self-assigned this Aug 10, 2021
@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Aug 10, 2021
@irynakruk irynakruk changed the title 🐛 Source Postgres: fixed CDC OOM issue 🐛 Source Postgres: fix CDC OOM issue Aug 10, 2021
fixed test
@irynakruk irynakruk linked an issue Aug 10, 2021 that may be closed by this pull request
"type": "string",
"description": "A pgoutput logical replication slot.",
"description": "A logical decoding plug-in installed on the PostgreSQL server. Please use `pgoutput` plug-in by default.\nFor more information when to use `wal2json` plug-in read <a href=\"https://docs.airbyte.io/integrations/sources/postgres\">Postgres Source</a> docs.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@irynakruk pgoutput plug-in is used by default.
For the next sentence you can specify a key objections which are in that article so the description wil be selfcontained

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaroslav-hrytsaienko changed to:

A logical decoding plug-in installed on the PostgreSQL server. pgoutput plug-in is used by default.\nIf replication table contains a lot of big jsonb values it is recommended to use wal2json plug-in. For more information about wal2json plug-in read <a href="https://docs.airbyte.io/integrations/sources/postgres\">Postgres Source docs.

@etsybaev etsybaev self-requested a review August 16, 2021 12:47
@irynakruk irynakruk marked this pull request as ready for review August 16, 2021 13:58
@irynakruk
Copy link
Contributor Author

irynakruk commented Aug 16, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1135783340
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1135783340

@jrhizor jrhizor temporarily deployed to more-secrets August 16, 2021 14:20 Inactive
Copy link
Contributor

@subodh1810 subodh1810 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! Just 1 question

@@ -155,6 +155,9 @@ protected Properties getDebeziumProperties() {
props.setProperty("offset.storage", "org.apache.kafka.connect.storage.FileOffsetBackingStore");
props.setProperty("offset.storage.file.filename", offsetManager.getOffsetFilePath().toString());
props.setProperty("offset.flush.interval.ms", "1000"); // todo: make this longer
// default values from debezium CommonConnectorConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default queue size and batch size of the debezium for MySql and MS Sql is 8192 and 2048 accordingly, so we use queue size = 10000 on our side. However, the default values for Postgres is much higher (20240 and 10240) and when this properties are missing on our side, there is much more chances to get OOM. Since those values are default in CommonConnectorConfig, we can use it as a default on out side too.

@irynakruk
Copy link
Contributor Author

irynakruk commented Aug 18, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1142829171
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1142829171

@jrhizor jrhizor temporarily deployed to more-secrets August 18, 2021 10:56 Inactive
@irynakruk
Copy link
Contributor Author

irynakruk commented Aug 18, 2021

/publish connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1143051468
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1143051468

@jrhizor jrhizor temporarily deployed to more-secrets August 18, 2021 12:13 Inactive
@irynakruk irynakruk merged commit 954d7cc into master Aug 19, 2021
@irynakruk irynakruk deleted the irynakruk/4746-postgres-oom branch August 19, 2021 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Postgres CDC OOM issue
5 participants