Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: _airbyte_meta Errors #29380

Merged
merged 18 commits into from
Aug 15, 2023
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 43 additions & 38 deletions docs/release_notes/upgrading_to_destinations_v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,13 @@ import {SnowflakeMigrationGenerator, BigQueryMigrationGenerator} from './destina
## What is Destinations V2?

Starting today, Airbyte Destinations V2 provides you with:
* One-to-one table mapping: Data in one stream will always be mapped to one table in your data warehouse. No more sub-tables.
* Improved error handling with `_airbyte_meta`: Airbyte will now populate typing errors in the `_airbyte_meta` column instead of failing your sync. You can query these results to audit misformatted or unexpected data.
* Internal Airbyte tables in the `airbyte_internal` schema: Airbyte will now generate all raw tables in the `airbyte_internal` schema. We no longer clutter your destination schema with raw data tables.
* Incremental delivery for large syncs: Data will be incrementally delivered to your final tables. No more waiting hours to see the first rows in your destination table.

To see more details and examples on the contents of the Destinations V2 release, see this [guide](../understanding-airbyte/typing-deduping.md). The remainder of this page will walk you through upgrading connectors from legacy normalization to Destinations V2.
- One-to-one table mapping: Data in one stream will always be mapped to one table in your data warehouse. No more sub-tables.
- Improved error handling with `_airbyte_meta`: Airbyte will now populate typing errors in the `_airbyte_meta` column instead of failing your sync. You can query these results to audit misformatted or unexpected data.
- Internal Airbyte tables in the `airbyte_internal` schema: Airbyte will now generate all raw tables in the `airbyte_internal` schema. We no longer clutter your destination schema with raw data tables.
- Incremental delivery for large syncs: Data will be incrementally delivered to your final tables. No more waiting hours to see the first rows in your destination table.

To see more details and examples on the contents of the Destinations V2 release, see this [guide](understanding-airbyte/typing-deduping.md). The remainder of this page will walk you through upgrading connectors from legacy normalization to Destinations V2.

## Deprecating Legacy Normalization

Expand All @@ -26,15 +27,15 @@ As a Cloud user, existing connections using legacy normalization will be paused

The following table details the delivered data modified by Destinations V2:

| Current Normalization Setting | Source Type | Impacted Data (Breaking Changes) |
|-----------------------------------|--------------------------------------- |--------------------------------------------------------------------------------|
| Raw JSON | All | `_airbyte` metadata columns, raw table location |
| Normalized tabular data | API Source | Unnested tables, `_airbyte` metadata columns, SCD tables |
| Normalized tabular data | Tabular Source (database, file, etc.) | `_airbyte` metadata columns, SCD tables |
| Current Normalization Setting | Source Type | Impacted Data (Breaking Changes) |
| ----------------------------- | ------------------------------------- | -------------------------------------------------------- |
| Raw JSON | All | `_airbyte` metadata columns, raw table location |
| Normalized tabular data | API Source | Unnested tables, `_airbyte` metadata columns, SCD tables |
| Normalized tabular data | Tabular Source (database, file, etc.) | `_airbyte` metadata columns, SCD tables |

![Airbyte Destinations V2 Column Changes](./assets/destinations-v2-column-changes.png)

Whenever possible, we've taken this opportunity to use the best data type for storing JSON for your querying convenience. For example, `destination-bigquery` now loads `JSON` blobs as type `JSON` in BigQuery (introduced last [year](https://cloud.google.com/blog/products/data-analytics/bigquery-now-natively-supports-semi-structured-data)), instead of type `string`.
Whenever possible, we've taken this opportunity to use the best data type for storing JSON for your querying convenience. For example, `destination-bigquery` now loads `JSON` blobs as type `JSON` in BigQuery (introduced last [year](https://cloud.google.com/blog/products/data-analytics/bigquery-now-natively-supports-semi-structured-data)), instead of type `string`.

## Quick Start to Upgrading

Expand All @@ -43,6 +44,7 @@ The quickest path to upgrading is to click upgrade on any out-of-date connection
![Upgrade Path](./assets/airbyte_destinations_v2_upgrade_prompt.png)

After upgrading the out-of-date destination to a [Destinations V2 compatible version](#destinations-v2-effective-versions), the following will occur at the next sync **for each connection** sending data to the updated destination:

1. Existing raw tables replicated to this destination will be copied to a new `airbyte` schema.
2. The new raw tables will be updated to the new Destinations V2 format.
3. The new raw tables will be updated with any new data since the last sync, like normal.
Expand All @@ -53,12 +55,13 @@ Pre-existing raw tables, SCD tables and "unnested" tables will always be left un
Each destination version is managed separately, so if you have multiple destinations, they all need to be upgraded one by one.

Versions are tied to the destination. When you update the destination, **all connections tied to that destination will be sending data in the Destinations V2 format**. For upgrade paths that will minimize disruption to existing dashboards, see:
* [Upgrading Connections One by One with Dual-Writing](#upgrading-connections-one-by-one-with-dual-writing)
* [Testing Destinations V2 on a Single Connection](#testing-destinations-v2-for-a-single-connection)
* [Upgrading Connections One by One Using CDC](#upgrade-paths-for-connections-using-cdc)
* [Upgrading as a User of Raw Tables](#upgrading-as-a-user-of-raw-tables)
* [Rolling back to Legacy Normalization](#oss-only-rolling-back-to-legacy-normalization)


- [Upgrading Connections One by One with Dual-Writing](#upgrading-connections-one-by-one-with-dual-writing)
- [Testing Destinations V2 on a Single Connection](#testing-destinations-v2-for-a-single-connection)
- [Upgrading Connections One by One Using CDC](#upgrade-paths-for-connections-using-cdc)
- [Upgrading as a User of Raw Tables](#upgrading-as-a-user-of-raw-tables)
- [Rolling back to Legacy Normalization](#oss-only-rolling-back-to-legacy-normalization)

## Advanced Upgrade Paths

### Upgrading Connections One by One with Dual-Writing
Expand All @@ -67,7 +70,7 @@ Dual writing is a method employed during upgrades where new incoming data is wri

#### Steps to Follow for All Sync Modes

1. **[Open Source]** Update the default destination version for your workspace to a [Destinations V2 compatible version](#destinations-v2-effective-versions). This sets the default version for any newly created destination. All existing syncs will remain on their current version.
1. **[Open Source]** Update the default destination version for your workspace to a [Destinations V2 compatible version](#destinations-v2-effective-versions). This sets the default version for any newly created destination. All existing syncs will remain on their current version.

![Upgrade your default destination version](assets/airbyte_version_upgrade.png)

Expand Down Expand Up @@ -104,6 +107,7 @@ These steps allow you to dual-write for connections incrementally syncing data w
### Testing Destinations V2 for a Single Connection

You may want to verify the format of updated data for a single connection. To do this:

1. If all of the streams you are looking to test with are in **full refresh mode**, follow the [steps for upgrading connections one by one](#steps-to-follow-for-all-sync-modes). Ensure any connections you create have a `Manual` replication frequency.
2. For any streams in **incremental** sync modes, follow the [steps for upgrading incremental syncs](#additional-steps-for-incremental-sync-modes). For testing, you do not need to copy pre-existing raw data. By solely inheriting state from a pre-existing connection, enabling a sync will provide a sample of the most recent data in the updated format for testing.

Expand All @@ -112,35 +116,36 @@ When you are done testing, you can disable or delete this testing connection, an
### Upgrading as a User of Raw Tables

If you have written downstream transformations directly from the output of raw tables, or use the "Raw JSON" normalization setting, you should know that:
* Multiple column names are being updated (from `airbyte_ab_id` to `airbyte_raw_id`, and `airbyte_emitted_at` to `airbyte_extracted_at`).
* The location of raw tables will from now on default to an `airbyte` schema in your destination.
* When you upgrade to a [Destinations V2 compatible version](#destinations-v2-effective-versions) of your destination, we will never alter your existing raw data. Although existing downstream dashboards will go stale, they will never be broken.
* You can dual write by following the [steps above](#upgrading-connections-one-by-one-with-dual-writing) and copying your raw data to the schema of your newly created connection.

- Multiple column names are being updated (from `airbyte_ab_id` to `airbyte_raw_id`, and `airbyte_emitted_at` to `airbyte_extracted_at`).
- The location of raw tables will from now on default to an `airbyte` schema in your destination.
- When you upgrade to a [Destinations V2 compatible version](#destinations-v2-effective-versions) of your destination, we will never alter your existing raw data. Although existing downstream dashboards will go stale, they will never be broken.
- You can dual write by following the [steps above](#upgrading-connections-one-by-one-with-dual-writing) and copying your raw data to the schema of your newly created connection.

We may make further changes to raw tables in the future, as these tables are intended to be a staging ground for Airbyte to optimize the performance of your syncs. We cannot guarantee the same level of stability as for final tables in your destination schema.

### Upgrade Paths for Connections using CDC

For each [CDC-supported](https://docs.airbyte.com/understanding-airbyte/cdc) source connector, we recommend the following:

| CDC Source | Recommendation | Notes |
|------------ |----------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Postgres | [Upgrade connection in place](#quick-start-to-upgrading) | You can optionally dual write, but this requires resyncing historical data from the source. You must create a new Postgres source with a different replication slot than your existing source to preserve the integrity of your existing connection. |
| MySQL | [All above upgrade paths supported](#advanced-upgrade-paths) | You can upgrade the connection in place, or dual write. When dual writing, Airbyte can leverage the state of an existing, active connection to ensure historical data is not re-replicated from MySQL. |
| SQL Server | [Upgrade connection in place](#quick-start-to-upgrading) | You can optionally dual write, but this requires resyncing historical data from the SQL Server source. |
| CDC Source | Recommendation | Notes |
| ---------- | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Postgres | [Upgrade connection in place](#quick-start-to-upgrading) | You can optionally dual write, but this requires resyncing historical data from the source. You must create a new Postgres source with a different replication slot than your existing source to preserve the integrity of your existing connection. |
| MySQL | [All above upgrade paths supported](#advanced-upgrade-paths) | You can upgrade the connection in place, or dual write. When dual writing, Airbyte can leverage the state of an existing, active connection to ensure historical data is not re-replicated from MySQL. |
| SQL Server | [Upgrade connection in place](#quick-start-to-upgrading) | You can optionally dual write, but this requires resyncing historical data from the SQL Server source. |

## Destinations V2 Compatible Versions

For each destination connector, Destinations V2 is effective as of the following versions:

| Destination Connector | Safe Rollback Version | Destinations V2 Compatible |
|----------------------- |----------------------- |------------------------------|
| BigQuery | 1.4.4 | 2.0.0+ |
| Snowflake | 0.4.1 | 2.0.0+ |
| Redshift | 0.4.8 | 2.0.0+ |
| MSSQL | 0.1.24 | 2.0.0+ |
| MySQL | 0.1.20 | 2.0.0+ |
| Oracle | 0.1.19 | 2.0.0+ |
| TiDB | 0.1.3 | 2.0.0+ |
| DuckDB | 0.1.0 | 2.0.0+ |
| Clickhouse | 0.2.3 | 2.0.0+ |
| Destination Connector | Safe Rollback Version | Destinations V2 Compatible |
| --------------------- | --------------------- | -------------------------- |
| BigQuery | 1.4.4 | 2.0.0+ |
| Snowflake | 0.4.1 | 2.0.0+ |
| Redshift | 0.4.8 | 2.0.0+ |
| MSSQL | 0.1.24 | 2.0.0+ |
| MySQL | 0.1.20 | 2.0.0+ |
| Oracle | 0.1.19 | 2.0.0+ |
| TiDB | 0.1.3 | 2.0.0+ |
| DuckDB | 0.1.0 | 2.0.0+ |
| Clickhouse | 0.2.3 | 2.0.0+ |
Loading