Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] update pg destination warnings #36454

Merged
merged 2 commits into from
Mar 25, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 71 additions & 47 deletions docs/integrations/destinations/postgres.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,18 @@ This page guides you through the process of setting up the Postgres destination

:::caution

Postgres, while an excellent relational database, is not a data warehouse.

1. Postgres is likely to perform poorly with large data volumes. Even postgres-compatible destinations (e.g. AWS Aurora) are not immune to slowdowns when dealing with large writes or updates over ~500GB. Especially when using normalization with `destination-postgres`, be sure to monitor your database's memory and CPU usage during your syncs. It is possible for your destination to 'lock up', and incur high usage costs with large sync volumes.
2. Postgres column size limitations are likley to cause colisions when used as a destination reciving data from highly-nested and flattened sources.
Postgres, while an excellent relational database, is not a data warehouse.

1. Postgres is likely to perform poorly with large data volumes. Even postgres-compatible
destinations (e.g. AWS Aurora) are not immune to slowdowns when dealing with large writes or
updates over ~500GB. Especially when using normalization with `destination-postgres`, be sure to
monitor your database's memory and CPU usage during your syncs. It is possible for your
destination to 'lock up', and incur high usage costs with large sync volumes.
2. Postgres column [name length limitations](https://www.postgresql.org/docs/current/limits.html)
are likely to cause collisions when used as a destination receiving data from highly-nested and
flattened sources, e.g. `{63 byte name}_a` and `{63 byte name}_b` will both be truncated to
`{63 byte name}` which causes postgres to throw an error that a duplicate column name was
specified.

:::

Expand All @@ -23,11 +31,15 @@ used by default. Other than that, you can proceed with the open-source instructi
You'll need the following information to configure the Postgres destination:

- **Host** - The host name of the server.
- **Port** - The port number the server is listening on. Defaults to the PostgreSQL™ standard port number (5432).
- **Port** - The port number the server is listening on. Defaults to the PostgreSQL™ standard port
number (5432).
- **Username**
- **Password**
- **Default Schema Name** - Specify the schema (or several schemas separated by commas) to be set in the search-path. These schemas will be used to resolve unqualified object names used in statements executed over this connection.
- **Database** - The database name. The default is to connect to a database with the same name as the user name.
- **Default Schema Name** - Specify the schema (or several schemas separated by commas) to be set in
the search-path. These schemas will be used to resolve unqualified object names used in statements
executed over this connection.
- **Database** - The database name. The default is to connect to a database with the same name as
the user name.
- **JDBC URL Params** (optional)

[Refer to this guide for more details](https://jdbc.postgresql.org/documentation/use/#connecting-to-the-database)
Expand Down Expand Up @@ -64,17 +76,18 @@ synced data from Airbyte.

## Naming Conventions

From [Postgres SQL Identifiers syntax](https://www.postgresql.org/docs/9.0/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS):
From
[Postgres SQL Identifiers syntax](https://www.postgresql.org/docs/9.0/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS):

- SQL identifiers and key words must begin with a letter \(a-z, but also letters with diacritical
marks and non-Latin letters\) or an underscore \(\_\).
- Subsequent characters in an identifier or key word can be letters, underscores, digits \(0-9\), or
dollar signs \($\).

Note that dollar signs are not allowed in identifiers according to the SQL standard,
so their use might render applications less portable. The SQL standard will not define a key word
that contains digits or starts or ends with an underscore, so identifiers of this form are safe
against possible conflict with future extensions of the standard.
Note that dollar signs are not allowed in identifiers according to the SQL standard, so their use
might render applications less portable. The SQL standard will not define a key word that contains
digits or starts or ends with an underscore, so identifiers of this form are safe against possible
conflict with future extensions of the standard.

- The system uses no more than NAMEDATALEN-1 bytes of an identifier; longer names can be written in
commands, but they will be truncated. By default, NAMEDATALEN is 64 so the maximum identifier
Expand All @@ -85,61 +98,71 @@ From [Postgres SQL Identifiers syntax](https://www.postgresql.org/docs/9.0/sql-s
still applies.
- Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to
lower case.
- In order to make your applications portable and less error-prone, use consistent quoting with each name (either always quote it or never quote it).
- In order to make your applications portable and less error-prone, use consistent quoting with each
name (either always quote it or never quote it).

:::info

Airbyte Postgres destination will create raw tables and schemas using the Unquoted
identifiers by replacing any special characters with an underscore. All final tables and their corresponding
Airbyte Postgres destination will create raw tables and schemas using the Unquoted identifiers by
replacing any special characters with an underscore. All final tables and their corresponding
columns are created using Quoted identifiers preserving the case sensitivity.

:::

**For Airbyte Cloud:**

1. [Log into your Airbyte Cloud](https://cloud.airbyte.com/workspaces) account.
2. In the left navigation bar, click **Destinations**. In the top-right corner, click **new destination**.
3. On the Set up the destination page, enter the name for the Postgres connector
and select **Postgres** from the Destination type dropdown.
2. In the left navigation bar, click **Destinations**. In the top-right corner, click **new
destination**.
3. On the Set up the destination page, enter the name for the Postgres connector and select
**Postgres** from the Destination type dropdown.
4. Enter a name for your source.
5. For the **Host**, **Port**, and **DB Name**, enter the hostname, port number, and name for your Postgres database.
6. List the **Default Schemas**.
:::note
The schema names are case sensitive. The 'public' schema is set by default. Multiple schemas may be used at one time. No schemas set explicitly - will sync all of existing.
:::
7. For **User** and **Password**, enter the username and password you created in [Step 1](#step-1-optional-create-a-dedicated-read-only-user).
8. For Airbyte Open Source, toggle the switch to connect using SSL. For Airbyte Cloud uses SSL by default.
5. For the **Host**, **Port**, and **DB Name**, enter the hostname, port number, and name for your
Postgres database.
6. List the **Default Schemas**. :::note The schema names are case sensitive. The 'public' schema is
set by default. Multiple schemas may be used at one time. No schemas set explicitly - will sync
all of existing. :::
7. For **User** and **Password**, enter the username and password you created in
[Step 1](#step-1-optional-create-a-dedicated-read-only-user).
8. For Airbyte Open Source, toggle the switch to connect using SSL. For Airbyte Cloud uses SSL by
default.
9. For SSL Modes, select:
- **disable** to disable encrypted communication between Airbyte and the source
- **allow** to enable encrypted communication only when required by the source
- **prefer** to allow unencrypted communication only when the source doesn't support encryption
- **require** to always require encryption. Note: The connection will fail if the source doesn't support encryption.
- **verify-ca** to always require encryption and verify that the source has a valid SSL certificate
- **require** to always require encryption. Note: The connection will fail if the source doesn't
support encryption.
- **verify-ca** to always require encryption and verify that the source has a valid SSL
certificate
- **verify-full** to always require encryption and verify the identity of the source
10. To customize the JDBC connection beyond common options, specify additional supported [JDBC URL parameters](https://jdbc.postgresql.org/documentation/head/connect.html) as key-value pairs separated by the symbol & in the **JDBC URL Parameters (Advanced)** field.
10. To customize the JDBC connection beyond common options, specify additional supported
[JDBC URL parameters](https://jdbc.postgresql.org/documentation/head/connect.html) as key-value
pairs separated by the symbol & in the **JDBC URL Parameters (Advanced)** field.

Example: key1=value1&key2=value2&key3=value3

These parameters will be added at the end of the JDBC URL that the AirByte will use to connect to your Postgres database.
These parameters will be added at the end of the JDBC URL that the AirByte will use to connect
to your Postgres database.

The connector now supports `connectTimeout` and defaults to 60 seconds. Setting connectTimeout to 0 seconds will set the timeout to the longest time available.
The connector now supports `connectTimeout` and defaults to 60 seconds. Setting connectTimeout
to 0 seconds will set the timeout to the longest time available.

**Note:** Do not use the following keys in JDBC URL Params field as they will be overwritten by Airbyte:
`currentSchema`, `user`, `password`, `ssl`, and `sslmode`.
**Note:** Do not use the following keys in JDBC URL Params field as they will be overwritten by
Airbyte: `currentSchema`, `user`, `password`, `ssl`, and `sslmode`.

:::warning
This is an advanced configuration option. Users are advised to use it with caution.
:::warning This is an advanced configuration option. Users are advised to use it with caution.
:::

11. For SSH Tunnel Method, select:

- **No Tunnel** for a direct connection to the database
- **SSH Key Authentication** to use an RSA Private as your secret for establishing the SSH tunnel
- **SSH Key Authentication** to use an RSA Private as your secret for establishing the SSH
tunnel
- **Password Authentication** to use a password as your secret for establishing the SSH tunnel

:::warning
Since Airbyte Cloud requires encrypted communication, select **SSH Key Authentication** or **Password Authentication** if you selected **disable**, **allow**, or **prefer** as the **SSL Mode**; otherwise, the connection will fail.
:::
:::warning Since Airbyte Cloud requires encrypted communication, select **SSH Key
Authentication** or **Password Authentication** if you selected **disable**, **allow**, or
**prefer** as the **SSL Mode**; otherwise, the connection will fail. :::

12. Click **Set up destination**.

Expand All @@ -159,22 +182,23 @@ following[ sync modes](https://docs.airbyte.com/cloud/core-concepts#connection-s

### Output Schema (Raw Tables)

Each stream will be mapped to a separate raw table in Postgres. The default schema in which the raw tables are
created is `airbyte_internal`. This can be overridden in the configuration.
Each table will contain 3 columns:
Each stream will be mapped to a separate raw table in Postgres. The default schema in which the raw
tables are created is `airbyte_internal`. This can be overridden in the configuration. Each table
will contain 3 columns:

- `_airbyte_raw_id`: a uuid assigned by Airbyte to each event that is processed. The column type in
Postgres is `VARCHAR`.
- `_airbyte_extracted_at`: a timestamp representing when the event was pulled from the data source.
The column type in Postgres is `TIMESTAMP WITH TIME ZONE`.
- `_airbyte_loaded_at`: a timestamp representing when the row was processed into final table.
The column type in Postgres is `TIMESTAMP WITH TIME ZONE`.
- `_airbyte_data`: a json blob representing with the event data. The column type in Postgres
is `JSONB`.
- `_airbyte_loaded_at`: a timestamp representing when the row was processed into final table. The
column type in Postgres is `TIMESTAMP WITH TIME ZONE`.
- `_airbyte_data`: a json blob representing with the event data. The column type in Postgres is
`JSONB`.

### Final Tables Data type mapping

| Airbyte Type | Postgres Type |
|:---------------------------|:-------------------------|
| :------------------------- | :----------------------- |
| string | VARCHAR |
| number | DECIMAL |
| integer | BIGINT |
Expand All @@ -197,7 +221,7 @@ Now that you have set up the Postgres destination connector, check out the follo
## Changelog

| Version | Date | Pull Request | Subject |
|:--------|:-----------|:-----------------------------------------------------------|:----------------------------------------------------------------------------------------------------|
| :------ | :--------- | :--------------------------------------------------------- | :-------------------------------------------------------------------------------------------------- |
| 2.0.4 | 2024-03-07 | [\#35899](https://github.com/airbytehq/airbyte/pull/35899) | Adopt CDK 0.23.18; Null safety check in state parsing |
| 2.0.3 | 2024-03-01 | [\#35528](https://github.com/airbytehq/airbyte/pull/35528) | Adopt CDK 0.23.11; Use Migration framework |
| 2.0.2 | 2024-03-01 | [\#35760](https://github.com/airbytehq/airbyte/pull/35760) | Mark as certified, add PSQL exception to deinterpolator |
Expand Down
Loading