From 4d6b68abb124d771bcfe7cf9623229ba9bb562ac Mon Sep 17 00:00:00 2001 From: evantahler Date: Mon, 25 Mar 2024 10:13:38 -0700 Subject: [PATCH 1/2] [docs] update pg destination warnings --- docs/integrations/destinations/postgres.md | 118 +++++++++++++-------- 1 file changed, 71 insertions(+), 47 deletions(-) diff --git a/docs/integrations/destinations/postgres.md b/docs/integrations/destinations/postgres.md index 05f569052b90..05c656e0ee16 100644 --- a/docs/integrations/destinations/postgres.md +++ b/docs/integrations/destinations/postgres.md @@ -4,10 +4,18 @@ This page guides you through the process of setting up the Postgres destination :::caution -Postgres, while an excellent relational database, is not a data warehouse. - -1. Postgres is likely to perform poorly with large data volumes. Even postgres-compatible destinations (e.g. AWS Aurora) are not immune to slowdowns when dealing with large writes or updates over ~500GB. Especially when using normalization with `destination-postgres`, be sure to monitor your database's memory and CPU usage during your syncs. It is possible for your destination to 'lock up', and incur high usage costs with large sync volumes. -2. Postgres column size limitations are likley to cause colisions when used as a destination reciving data from highly-nested and flattened sources. +Postgres, while an excellent relational database, is not a data warehouse. + +1. Postgres is likely to perform poorly with large data volumes. Even postgres-compatible + destinations (e.g. AWS Aurora) are not immune to slowdowns when dealing with large writes or + updates over ~500GB. Especially when using normalization with `destination-postgres`, be sure to + monitor your database's memory and CPU usage during your syncs. It is possible for your + destination to 'lock up', and incur high usage costs with large sync volumes. +2. Postgres column [name length limitations](https://www.postgresql.org/docs/current/limits.html) + are likely to cause collisions when used as a destination receiving data from highly-nested and + flattened sources, e.g. `{63 byte name}_a` and `{63 byte name}_b` will both be truncated to + `{63 byte name}` which causes postgres to throw an error that a duplicate column name was + specified. ::: @@ -23,11 +31,15 @@ used by default. Other than that, you can proceed with the open-source instructi You'll need the following information to configure the Postgres destination: - **Host** - The host name of the server. -- **Port** - The port number the server is listening on. Defaults to the PostgreSQL™ standard port number (5432). +- **Port** - The port number the server is listening on. Defaults to the PostgreSQL™ standard port + number (5432). - **Username** - **Password** -- **Default Schema Name** - Specify the schema (or several schemas separated by commas) to be set in the search-path. These schemas will be used to resolve unqualified object names used in statements executed over this connection. -- **Database** - The database name. The default is to connect to a database with the same name as the user name. +- **Default Schema Name** - Specify the schema (or several schemas separated by commas) to be set in + the search-path. These schemas will be used to resolve unqualified object names used in statements + executed over this connection. +- **Database** - The database name. The default is to connect to a database with the same name as + the user name. - **JDBC URL Params** (optional) [Refer to this guide for more details](https://jdbc.postgresql.org/documentation/use/#connecting-to-the-database) @@ -64,17 +76,18 @@ synced data from Airbyte. ## Naming Conventions -From [Postgres SQL Identifiers syntax](https://www.postgresql.org/docs/9.0/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS): +From +[Postgres SQL Identifiers syntax](https://www.postgresql.org/docs/9.0/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS): - SQL identifiers and key words must begin with a letter \(a-z, but also letters with diacritical marks and non-Latin letters\) or an underscore \(\_\). - Subsequent characters in an identifier or key word can be letters, underscores, digits \(0-9\), or dollar signs \($\). - Note that dollar signs are not allowed in identifiers according to the SQL standard, - so their use might render applications less portable. The SQL standard will not define a key word - that contains digits or starts or ends with an underscore, so identifiers of this form are safe - against possible conflict with future extensions of the standard. + Note that dollar signs are not allowed in identifiers according to the SQL standard, so their use + might render applications less portable. The SQL standard will not define a key word that contains + digits or starts or ends with an underscore, so identifiers of this form are safe against possible + conflict with future extensions of the standard. - The system uses no more than NAMEDATALEN-1 bytes of an identifier; longer names can be written in commands, but they will be truncated. By default, NAMEDATALEN is 64 so the maximum identifier @@ -85,12 +98,13 @@ From [Postgres SQL Identifiers syntax](https://www.postgresql.org/docs/9.0/sql-s still applies. - Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case. -- In order to make your applications portable and less error-prone, use consistent quoting with each name (either always quote it or never quote it). +- In order to make your applications portable and less error-prone, use consistent quoting with each + name (either always quote it or never quote it). :::info -Airbyte Postgres destination will create raw tables and schemas using the Unquoted -identifiers by replacing any special characters with an underscore. All final tables and their corresponding +Airbyte Postgres destination will create raw tables and schemas using the Unquoted identifiers by +replacing any special characters with an underscore. All final tables and their corresponding columns are created using Quoted identifiers preserving the case sensitivity. ::: @@ -98,48 +112,57 @@ columns are created using Quoted identifiers preserving the case sensitivity. **For Airbyte Cloud:** 1. [Log into your Airbyte Cloud](https://cloud.airbyte.com/workspaces) account. -2. In the left navigation bar, click **Destinations**. In the top-right corner, click **new destination**. -3. On the Set up the destination page, enter the name for the Postgres connector - and select **Postgres** from the Destination type dropdown. +2. In the left navigation bar, click **Destinations**. In the top-right corner, click **new + destination**. +3. On the Set up the destination page, enter the name for the Postgres connector and select + **Postgres** from the Destination type dropdown. 4. Enter a name for your source. -5. For the **Host**, **Port**, and **DB Name**, enter the hostname, port number, and name for your Postgres database. -6. List the **Default Schemas**. - :::note - The schema names are case sensitive. The 'public' schema is set by default. Multiple schemas may be used at one time. No schemas set explicitly - will sync all of existing. - ::: -7. For **User** and **Password**, enter the username and password you created in [Step 1](#step-1-optional-create-a-dedicated-read-only-user). -8. For Airbyte Open Source, toggle the switch to connect using SSL. For Airbyte Cloud uses SSL by default. +5. For the **Host**, **Port**, and **DB Name**, enter the hostname, port number, and name for your + Postgres database. +6. List the **Default Schemas**. :::note The schema names are case sensitive. The 'public' schema is + set by default. Multiple schemas may be used at one time. No schemas set explicitly - will sync + all of existing. ::: +7. For **User** and **Password**, enter the username and password you created in + [Step 1](#step-1-optional-create-a-dedicated-read-only-user). +8. For Airbyte Open Source, toggle the switch to connect using SSL. For Airbyte Cloud uses SSL by + default. 9. For SSL Modes, select: - **disable** to disable encrypted communication between Airbyte and the source - **allow** to enable encrypted communication only when required by the source - **prefer** to allow unencrypted communication only when the source doesn't support encryption - - **require** to always require encryption. Note: The connection will fail if the source doesn't support encryption. - - **verify-ca** to always require encryption and verify that the source has a valid SSL certificate + - **require** to always require encryption. Note: The connection will fail if the source doesn't + support encryption. + - **verify-ca** to always require encryption and verify that the source has a valid SSL + certificate - **verify-full** to always require encryption and verify the identity of the source -10. To customize the JDBC connection beyond common options, specify additional supported [JDBC URL parameters](https://jdbc.postgresql.org/documentation/head/connect.html) as key-value pairs separated by the symbol & in the **JDBC URL Parameters (Advanced)** field. +10. To customize the JDBC connection beyond common options, specify additional supported + [JDBC URL parameters](https://jdbc.postgresql.org/documentation/head/connect.html) as key-value + pairs separated by the symbol & in the **JDBC URL Parameters (Advanced)** field. Example: key1=value1&key2=value2&key3=value3 - These parameters will be added at the end of the JDBC URL that the AirByte will use to connect to your Postgres database. + These parameters will be added at the end of the JDBC URL that the AirByte will use to connect + to your Postgres database. - The connector now supports `connectTimeout` and defaults to 60 seconds. Setting connectTimeout to 0 seconds will set the timeout to the longest time available. + The connector now supports `connectTimeout` and defaults to 60 seconds. Setting connectTimeout + to 0 seconds will set the timeout to the longest time available. - **Note:** Do not use the following keys in JDBC URL Params field as they will be overwritten by Airbyte: - `currentSchema`, `user`, `password`, `ssl`, and `sslmode`. + **Note:** Do not use the following keys in JDBC URL Params field as they will be overwritten by + Airbyte: `currentSchema`, `user`, `password`, `ssl`, and `sslmode`. - :::warning - This is an advanced configuration option. Users are advised to use it with caution. + :::warning This is an advanced configuration option. Users are advised to use it with caution. ::: 11. For SSH Tunnel Method, select: - **No Tunnel** for a direct connection to the database - - **SSH Key Authentication** to use an RSA Private as your secret for establishing the SSH tunnel + - **SSH Key Authentication** to use an RSA Private as your secret for establishing the SSH + tunnel - **Password Authentication** to use a password as your secret for establishing the SSH tunnel - :::warning - Since Airbyte Cloud requires encrypted communication, select **SSH Key Authentication** or **Password Authentication** if you selected **disable**, **allow**, or **prefer** as the **SSL Mode**; otherwise, the connection will fail. - ::: + :::warning Since Airbyte Cloud requires encrypted communication, select **SSH Key + Authentication** or **Password Authentication** if you selected **disable**, **allow**, or + **prefer** as the **SSL Mode**; otherwise, the connection will fail. ::: 12. Click **Set up destination**. @@ -159,22 +182,23 @@ following[ sync modes](https://docs.airbyte.com/cloud/core-concepts#connection-s ### Output Schema (Raw Tables) -Each stream will be mapped to a separate raw table in Postgres. The default schema in which the raw tables are -created is `airbyte_internal`. This can be overridden in the configuration. -Each table will contain 3 columns: +Each stream will be mapped to a separate raw table in Postgres. The default schema in which the raw +tables are created is `airbyte_internal`. This can be overridden in the configuration. Each table +will contain 3 columns: - `_airbyte_raw_id`: a uuid assigned by Airbyte to each event that is processed. The column type in Postgres is `VARCHAR`. - `_airbyte_extracted_at`: a timestamp representing when the event was pulled from the data source. The column type in Postgres is `TIMESTAMP WITH TIME ZONE`. -- `_airbyte_loaded_at`: a timestamp representing when the row was processed into final table. - The column type in Postgres is `TIMESTAMP WITH TIME ZONE`. -- `_airbyte_data`: a json blob representing with the event data. The column type in Postgres - is `JSONB`. +- `_airbyte_loaded_at`: a timestamp representing when the row was processed into final table. The + column type in Postgres is `TIMESTAMP WITH TIME ZONE`. +- `_airbyte_data`: a json blob representing with the event data. The column type in Postgres is + `JSONB`. ### Final Tables Data type mapping + | Airbyte Type | Postgres Type | -|:---------------------------|:-------------------------| +| :------------------------- | :----------------------- | | string | VARCHAR | | number | DECIMAL | | integer | BIGINT | @@ -197,7 +221,7 @@ Now that you have set up the Postgres destination connector, check out the follo ## Changelog | Version | Date | Pull Request | Subject | -|:--------|:-----------|:-----------------------------------------------------------|:----------------------------------------------------------------------------------------------------| +| :------ | :--------- | :--------------------------------------------------------- | :-------------------------------------------------------------------------------------------------- | | 2.0.4 | 2024-03-07 | [\#35899](https://github.com/airbytehq/airbyte/pull/35899) | Adopt CDK 0.23.18; Null safety check in state parsing | | 2.0.3 | 2024-03-01 | [\#35528](https://github.com/airbytehq/airbyte/pull/35528) | Adopt CDK 0.23.11; Use Migration framework | | 2.0.2 | 2024-03-01 | [\#35760](https://github.com/airbytehq/airbyte/pull/35760) | Mark as certified, add PSQL exception to deinterpolator | From 2c3b7298a78210f4e1fd30fdb04c5bfc5fca2ac4 Mon Sep 17 00:00:00 2001 From: evantahler Date: Mon, 25 Mar 2024 10:15:06 -0700 Subject: [PATCH 2/2] fix callouts --- docs/integrations/destinations/postgres.md | 29 ++++++++++++++++------ 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/docs/integrations/destinations/postgres.md b/docs/integrations/destinations/postgres.md index 05c656e0ee16..655cc3e42fea 100644 --- a/docs/integrations/destinations/postgres.md +++ b/docs/integrations/destinations/postgres.md @@ -119,9 +119,15 @@ columns are created using Quoted identifiers preserving the case sensitivity. 4. Enter a name for your source. 5. For the **Host**, **Port**, and **DB Name**, enter the hostname, port number, and name for your Postgres database. -6. List the **Default Schemas**. :::note The schema names are case sensitive. The 'public' schema is - set by default. Multiple schemas may be used at one time. No schemas set explicitly - will sync - all of existing. ::: +6. List the **Default Schemas**. + +:::note + +The schema names are case sensitive. The 'public' schema is set by default. Multiple schemas may be +used at one time. No schemas set explicitly - will sync all of existing. + +::: + 7. For **User** and **Password**, enter the username and password you created in [Step 1](#step-1-optional-create-a-dedicated-read-only-user). 8. For Airbyte Open Source, toggle the switch to connect using SSL. For Airbyte Cloud uses SSL by @@ -150,8 +156,11 @@ columns are created using Quoted identifiers preserving the case sensitivity. **Note:** Do not use the following keys in JDBC URL Params field as they will be overwritten by Airbyte: `currentSchema`, `user`, `password`, `ssl`, and `sslmode`. - :::warning This is an advanced configuration option. Users are advised to use it with caution. - ::: +:::warning + +This is an advanced configuration option. Users are advised to use it with caution. + +::: 11. For SSH Tunnel Method, select: @@ -160,9 +169,13 @@ columns are created using Quoted identifiers preserving the case sensitivity. tunnel - **Password Authentication** to use a password as your secret for establishing the SSH tunnel - :::warning Since Airbyte Cloud requires encrypted communication, select **SSH Key - Authentication** or **Password Authentication** if you selected **disable**, **allow**, or - **prefer** as the **SSL Mode**; otherwise, the connection will fail. ::: +:::warning + +Since Airbyte Cloud requires encrypted communication, select **SSH Key Authentication** or +**Password Authentication** if you selected **disable**, **allow**, or **prefer** as the **SSL +Mode**; otherwise, the connection will fail. + +::: 12. Click **Set up destination**.