From aaebc0f67d69c8d684464e50d8e0b5cd74b37099 Mon Sep 17 00:00:00 2001 From: evantahler Date: Fri, 11 Aug 2023 13:57:15 -0700 Subject: [PATCH 01/17] lint --- .../upgrading_to_destinations_v2.md | 87 +++++++++++-------- docs/understanding-airbyte/typing-deduping.md | 47 +++++----- 2 files changed, 74 insertions(+), 60 deletions(-) diff --git a/docs/release_notes/upgrading_to_destinations_v2.md b/docs/release_notes/upgrading_to_destinations_v2.md index 8cf9bcad60ec..3d641302429e 100644 --- a/docs/release_notes/upgrading_to_destinations_v2.md +++ b/docs/release_notes/upgrading_to_destinations_v2.md @@ -7,10 +7,11 @@ import {SnowflakeMigrationGenerator, BigQueryMigrationGenerator} from './destina ## What is Destinations V2? Starting today, Airbyte Destinations V2 provides you with: -* One-to-one table mapping: Data in one stream will always be mapped to one table in your data warehouse. No more sub-tables. -* Improved error handling with `_airbyte_meta`: Airbyte will now populate typing errors in the `_airbyte_meta` column instead of failing your sync. You can query these results to audit misformatted or unexpected data. -* Internal Airbyte tables in the `airbyte_internal` schema: Airbyte will now generate all raw tables in the `airbyte_internal` schema. We no longer clutter your destination schema with raw data tables. -* Incremental delivery for large syncs: Data will be incrementally delivered to your final tables. No more waiting hours to see the first rows in your destination table. + +- One-to-one table mapping: Data in one stream will always be mapped to one table in your data warehouse. No more sub-tables. +- Improved error handling with `_airbyte_meta`: Airbyte will now populate typing errors in the `_airbyte_meta` column instead of failing your sync. You can query these results to audit misformatted or unexpected data. +- Internal Airbyte tables in the `airbyte_internal` schema: Airbyte will now generate all raw tables in the `airbyte_internal` schema. We no longer clutter your destination schema with raw data tables. +- Incremental delivery for large syncs: Data will be incrementally delivered to your final tables. No more waiting hours to see the first rows in your destination table. To see more details and examples on the contents of the Destinations V2 release, see this [guide](../understanding-airbyte/typing-deduping.md). The remainder of this page will walk you through upgrading connectors from legacy normalization to Destinations V2. @@ -26,15 +27,15 @@ As a Cloud user, existing connections using legacy normalization will be paused The following table details the delivered data modified by Destinations V2: -| Current Normalization Setting | Source Type | Impacted Data (Breaking Changes) | -|-----------------------------------|--------------------------------------- |--------------------------------------------------------------------------------| -| Raw JSON | All | `_airbyte` metadata columns, raw table location | -| Normalized tabular data | API Source | Unnested tables, `_airbyte` metadata columns, SCD tables | -| Normalized tabular data | Tabular Source (database, file, etc.) | `_airbyte` metadata columns, SCD tables | +| Current Normalization Setting | Source Type | Impacted Data (Breaking Changes) | +| ----------------------------- | ------------------------------------- | -------------------------------------------------------- | +| Raw JSON | All | `_airbyte` metadata columns, raw table location | +| Normalized tabular data | API Source | Unnested tables, `_airbyte` metadata columns, SCD tables | +| Normalized tabular data | Tabular Source (database, file, etc.) | `_airbyte` metadata columns, SCD tables | ![Airbyte Destinations V2 Column Changes](./assets/destinations-v2-column-changes.png) -Whenever possible, we've taken this opportunity to use the best data type for storing JSON for your querying convenience. For example, `destination-bigquery` now loads `JSON` blobs as type `JSON` in BigQuery (introduced last [year](https://cloud.google.com/blog/products/data-analytics/bigquery-now-natively-supports-semi-structured-data)), instead of type `string`. +Whenever possible, we've taken this opportunity to use the best data type for storing JSON for your querying convenience. For example, `destination-bigquery` now loads `JSON` blobs as type `JSON` in BigQuery (introduced last [year](https://cloud.google.com/blog/products/data-analytics/bigquery-now-natively-supports-semi-structured-data)), instead of type `string`. ## Quick Start to Upgrading @@ -43,6 +44,7 @@ The quickest path to upgrading is to click upgrade on any out-of-date connection ![Upgrade Path](./assets/airbyte_destinations_v2_upgrade_prompt.png) After upgrading the out-of-date destination to a [Destinations V2 compatible version](#destinations-v2-effective-versions), the following will occur at the next sync **for each connection** sending data to the updated destination: + 1. Existing raw tables replicated to this destination will be copied to a new `airbyte` schema. 2. The new raw tables will be updated to the new Destinations V2 format. 3. The new raw tables will be updated with any new data since the last sync, like normal. @@ -53,12 +55,21 @@ Pre-existing raw tables, SCD tables and "unnested" tables will always be left un Each destination version is managed separately, so if you have multiple destinations, they all need to be upgraded one by one. Versions are tied to the destination. When you update the destination, **all connections tied to that destination will be sending data in the Destinations V2 format**. For upgrade paths that will minimize disruption to existing dashboards, see: -* [Upgrading Connections One by One with Dual-Writing](#upgrading-connections-one-by-one-with-dual-writing) -* [Testing Destinations V2 on a Single Connection](#testing-destinations-v2-for-a-single-connection) -* [Upgrading Connections One by One Using CDC](#upgrade-paths-for-connections-using-cdc) -* [Upgrading as a User of Raw Tables](#upgrading-as-a-user-of-raw-tables) -* [Rolling back to Legacy Normalization](#oss-only-rolling-back-to-legacy-normalization) - + +- [Upgrading to Destinations V2](#upgrading-to-destinations-v2) + - [What is Destinations V2?](#what-is-destinations-v2) + - [Deprecating Legacy Normalization](#deprecating-legacy-normalization) + - [Breakdown of Breaking Changes](#breakdown-of-breaking-changes) + - [Quick Start to Upgrading](#quick-start-to-upgrading) + - [Advanced Upgrade Paths](#advanced-upgrade-paths) + - [Upgrading Connections One by One with Dual-Writing](#upgrading-connections-one-by-one-with-dual-writing) + - [Steps to Follow for All Sync Modes](#steps-to-follow-for-all-sync-modes) + - [Additional Steps for Incremental Sync Modes](#additional-steps-for-incremental-sync-modes) + - [Testing Destinations V2 for a Single Connection](#testing-destinations-v2-for-a-single-connection) + - [Upgrading as a User of Raw Tables](#upgrading-as-a-user-of-raw-tables) + - [Upgrade Paths for Connections using CDC](#upgrade-paths-for-connections-using-cdc) + - [Destinations V2 Compatible Versions](#destinations-v2-compatible-versions) + ## Advanced Upgrade Paths ### Upgrading Connections One by One with Dual-Writing @@ -67,7 +78,7 @@ Dual writing is a method employed during upgrades where new incoming data is wri #### Steps to Follow for All Sync Modes -1. **[Open Source]** Update the default destination version for your workspace to a [Destinations V2 compatible version](#destinations-v2-effective-versions). This sets the default version for any newly created destination. All existing syncs will remain on their current version. +1. **[Open Source]** Update the default destination version for your workspace to a [Destinations V2 compatible version](#destinations-v2-effective-versions). This sets the default version for any newly created destination. All existing syncs will remain on their current version. ![Upgrade your default destination version](assets/airbyte_version_upgrade.png) @@ -104,6 +115,7 @@ These steps allow you to dual-write for connections incrementally syncing data w ### Testing Destinations V2 for a Single Connection You may want to verify the format of updated data for a single connection. To do this: + 1. If all of the streams you are looking to test with are in **full refresh mode**, follow the [steps for upgrading connections one by one](#steps-to-follow-for-all-sync-modes). Ensure any connections you create have a `Manual` replication frequency. 2. For any streams in **incremental** sync modes, follow the [steps for upgrading incremental syncs](#additional-steps-for-incremental-sync-modes). For testing, you do not need to copy pre-existing raw data. By solely inheriting state from a pre-existing connection, enabling a sync will provide a sample of the most recent data in the updated format for testing. @@ -112,10 +124,11 @@ When you are done testing, you can disable or delete this testing connection, an ### Upgrading as a User of Raw Tables If you have written downstream transformations directly from the output of raw tables, or use the "Raw JSON" normalization setting, you should know that: -* Multiple column names are being updated (from `airbyte_ab_id` to `airbyte_raw_id`, and `airbyte_emitted_at` to `airbyte_extracted_at`). -* The location of raw tables will from now on default to an `airbyte` schema in your destination. -* When you upgrade to a [Destinations V2 compatible version](#destinations-v2-effective-versions) of your destination, we will never alter your existing raw data. Although existing downstream dashboards will go stale, they will never be broken. -* You can dual write by following the [steps above](#upgrading-connections-one-by-one-with-dual-writing) and copying your raw data to the schema of your newly created connection. + +- Multiple column names are being updated (from `airbyte_ab_id` to `airbyte_raw_id`, and `airbyte_emitted_at` to `airbyte_extracted_at`). +- The location of raw tables will from now on default to an `airbyte` schema in your destination. +- When you upgrade to a [Destinations V2 compatible version](#destinations-v2-effective-versions) of your destination, we will never alter your existing raw data. Although existing downstream dashboards will go stale, they will never be broken. +- You can dual write by following the [steps above](#upgrading-connections-one-by-one-with-dual-writing) and copying your raw data to the schema of your newly created connection. We may make further changes to raw tables in the future, as these tables are intended to be a staging ground for Airbyte to optimize the performance of your syncs. We cannot guarantee the same level of stability as for final tables in your destination schema. @@ -123,24 +136,24 @@ We may make further changes to raw tables in the future, as these tables are int For each [CDC-supported](https://docs.airbyte.com/understanding-airbyte/cdc) source connector, we recommend the following: -| CDC Source | Recommendation | Notes | -|------------ |----------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Postgres | [Upgrade connection in place](#quick-start-to-upgrading) | You can optionally dual write, but this requires resyncing historical data from the source. You must create a new Postgres source with a different replication slot than your existing source to preserve the integrity of your existing connection. | -| MySQL | [All above upgrade paths supported](#advanced-upgrade-paths) | You can upgrade the connection in place, or dual write. When dual writing, Airbyte can leverage the state of an existing, active connection to ensure historical data is not re-replicated from MySQL. | -| SQL Server | [Upgrade connection in place](#quick-start-to-upgrading) | You can optionally dual write, but this requires resyncing historical data from the SQL Server source. | +| CDC Source | Recommendation | Notes | +| ---------- | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Postgres | [Upgrade connection in place](#quick-start-to-upgrading) | You can optionally dual write, but this requires resyncing historical data from the source. You must create a new Postgres source with a different replication slot than your existing source to preserve the integrity of your existing connection. | +| MySQL | [All above upgrade paths supported](#advanced-upgrade-paths) | You can upgrade the connection in place, or dual write. When dual writing, Airbyte can leverage the state of an existing, active connection to ensure historical data is not re-replicated from MySQL. | +| SQL Server | [Upgrade connection in place](#quick-start-to-upgrading) | You can optionally dual write, but this requires resyncing historical data from the SQL Server source. | ## Destinations V2 Compatible Versions For each destination connector, Destinations V2 is effective as of the following versions: -| Destination Connector | Safe Rollback Version | Destinations V2 Compatible | -|----------------------- |----------------------- |------------------------------| -| BigQuery | 1.4.4 | 2.0.0+ | -| Snowflake | 0.4.1 | 2.0.0+ | -| Redshift | 0.4.8 | 2.0.0+ | -| MSSQL | 0.1.24 | 2.0.0+ | -| MySQL | 0.1.20 | 2.0.0+ | -| Oracle | 0.1.19 | 2.0.0+ | -| TiDB | 0.1.3 | 2.0.0+ | -| DuckDB | 0.1.0 | 2.0.0+ | -| Clickhouse | 0.2.3 | 2.0.0+ | +| Destination Connector | Safe Rollback Version | Destinations V2 Compatible | +| --------------------- | --------------------- | -------------------------- | +| BigQuery | 1.4.4 | 2.0.0+ | +| Snowflake | 0.4.1 | 2.0.0+ | +| Redshift | 0.4.8 | 2.0.0+ | +| MSSQL | 0.1.24 | 2.0.0+ | +| MySQL | 0.1.20 | 2.0.0+ | +| Oracle | 0.1.19 | 2.0.0+ | +| TiDB | 0.1.3 | 2.0.0+ | +| DuckDB | 0.1.0 | 2.0.0+ | +| Clickhouse | 0.2.3 | 2.0.0+ | diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index 4eab218724c6..005dab61e990 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -2,15 +2,16 @@ This page refers to new functionality currently available in **early access**. Typing and deduping will become the new default method of transforming datasets within data warehouse and database destinations after they've been replicated. This functionality is going live with [Destinations V2](https://github.com/airbytehq/airbyte/issues/26028), which is now in early access for BigQuery. -You will eventually be required to upgrade your connections to use the new destination versions. We are building tools for you to copy your connector’s configuration to a new version to make testing new destinations easier. These will be available in the next few weeks. +You will eventually be required to upgrade your connections to use the new destination versions. We are building tools for you to copy your connector’s configuration to a new version to make testing new destinations easier. These will be available in the next few weeks. ## What is Destinations V2? At launch, Airbyte Destinations V2 will provide: -* One-to-one table mapping: Data in one stream will always be mapped to one table in your data warehouse. No more sub-tables. -* Improved per-row error handling with `_airbyte_meta`: Airbyte will now populate typing errors in the `_airbyte_meta` column instead of failing your sync. You can query these results to audit misformatted or unexpected data. -* Internal Airbyte tables in the `airbyte_internal` schema: Airbyte will now generate all raw tables in the `airbyte_internal` schema. We no longer clutter your desired schema with raw data tables. -* Incremental delivery for large syncs: Data will be incrementally delivered to your final tables when possible. No more waiting hours to see the first rows in your destination table. + +- One-to-one table mapping: Data in one stream will always be mapped to one table in your data warehouse. No more sub-tables. +- Improved per-row error handling with `_airbyte_meta`: Airbyte will now populate typing errors in the `_airbyte_meta` column instead of failing your sync. You can query these results to audit misformatted or unexpected data. +- Internal Airbyte tables in the `airbyte_internal` schema: Airbyte will now generate all raw tables in the `airbyte_internal` schema. We no longer clutter your desired schema with raw data tables. +- Incremental delivery for large syncs: Data will be incrementally delivered to your final tables when possible. No more waiting hours to see the first rows in your destination table. ## Destinations V2 Example @@ -30,23 +31,23 @@ Consider the following [source schema](https://docs.airbyte.com/integrations/sou The data from one stream will now be mapped to one table in your schema as below: -#### Destination Table Name: *public.users* +#### Destination Table Name: _public.users_ -| *(note, not in actual table)* | _airbyte_raw_id | _airbyte_extracted_at | _airbyte_meta | id | first_name | age | address | -|----------------------------------------------- |----------------- |--------------------- |-------------------------------------------------------------------------- |---- |------------ |------ |--------------------------------------------- | -| Successful typing and de-duping ⟶ | xxx-xxx-xxx | 2022-01-01 12:00:00 | {} | 1 | sarah | 39 | { city: “San Francisco”, zip: “94131” } | -| Failed typing that didn’t break other rows ⟶ | yyy-yyy-yyy | 2022-01-01 12:00:00 | { errors: {[“fish” is not a valid integer for column “age”]} | 2 | evan | NULL | { city: “Menlo Park”, zip: “94002” } | -| Not-yet-typed ⟶ | | | | | | | | +| _(note, not in actual table)_ | \_airbyte_raw_id | \_airbyte_extracted_at | \_airbyte_meta | id | first_name | age | address | +| -------------------------------------------- | ---------------- | ---------------------- | ------------------------------------------------------------ | --- | ---------- | ---- | --------------------------------------- | +| Successful typing and de-duping ⟶ | xxx-xxx-xxx | 2022-01-01 12:00:00 | {} | 1 | sarah | 39 | { city: “San Francisco”, zip: “94131” } | +| Failed typing that didn’t break other rows ⟶ | yyy-yyy-yyy | 2022-01-01 12:00:00 | { errors: {[“fish” is not a valid integer for column “age”]} | 2 | evan | NULL | { city: “Menlo Park”, zip: “94002” } | +| Not-yet-typed ⟶ | | | | | | | | In legacy normalization, columns of [Airbyte type](https://docs.airbyte.com/understanding-airbyte/supported-data-types/#the-types) `Object` in the Destination were "unnested" into separate tables. In this example, with Destinations V2, the previously unnested `public.users_address` table with columns `city` and `zip` will no longer be generated. -#### Destination Table Name: *airbyte.raw_public_users* (`airbyte.{namespace}_{stream}`) +#### Destination Table Name: _airbyte.raw_public_users_ (`airbyte.{namespace}_{stream}`) -| *(note, not in actual table)* | _airbyte_raw_id | _airbyte_data | _airbyte_loaded_at | _airbyte_extracted_at | -|----------------------------------------------- |----------------- |------------------------------------------------------------------------------------------------------------- |---------------------- |--------------------- | -| Successful typing and de-duping ⟶ | xxx-xxx-xxx | { id: 1, first_name: “sarah”, age: 39, address: { city: “San Francisco”, zip: “94131” } } | 2022-01-01 12:00:001 | 2022-01-01 12:00:00 | -| Failed typing that didn’t break other rows ⟶ | yyy-yyy-yyy | { id: 2, first_name: “evan”, age: “fish”, address: { city: “Menlo Park”, zip: “94002” } } | 2022-01-01 12:00:001 | 2022-01-01 12:00:00 | -| Not-yet-typed ⟶ | zzz-zzz-zzz | { id: 3, first_name: “edward”, age: 35, address: { city: “Sunnyvale”, zip: “94003” } } | NULL | 2022-01-01 13:00:00 | +| _(note, not in actual table)_ | \_airbyte_raw_id | \_airbyte_data | \_airbyte_loaded_at | \_airbyte_extracted_at | +| -------------------------------------------- | ---------------- | ----------------------------------------------------------------------------------------- | -------------------- | ---------------------- | +| Successful typing and de-duping ⟶ | xxx-xxx-xxx | { id: 1, first_name: “sarah”, age: 39, address: { city: “San Francisco”, zip: “94131” } } | 2022-01-01 12:00:001 | 2022-01-01 12:00:00 | +| Failed typing that didn’t break other rows ⟶ | yyy-yyy-yyy | { id: 2, first_name: “evan”, age: “fish”, address: { city: “Menlo Park”, zip: “94002” } } | 2022-01-01 12:00:001 | 2022-01-01 12:00:00 | +| Not-yet-typed ⟶ | zzz-zzz-zzz | { id: 3, first_name: “edward”, age: 35, address: { city: “Sunnyvale”, zip: “94003” } } | NULL | 2022-01-01 13:00:00 | You also now see the following changes in Airbyte-provided columns: @@ -54,15 +55,15 @@ You also now see the following changes in Airbyte-provided columns: ## Participating in Early Access -You can start using Destinations V2 for BigQuery in early access by following the below instructions: +You can start using Destinations V2 for BigQuery in early access by following the below instructions: 1. **Upgrade your BigQuery Destination**: If you are using Airbyte Open Source, update your BigQuery destination version to the latest version. If you are a Cloud customer, this step will already be completed on your behalf. -2. **Enabling Destinations V2**: Create a new BigQuery destination, and enable the Destinations V2 option under `Advanced` settings. You will need your BigQuery credentials for this step. For this early release, we ask that you enable Destinations V2 on a new BigQuery destination using new connections. When Destinations V2 is fully available, there will be additional migration paths for upgrading your destination without resetting any of your existing connections. - 1. If your previous BigQuery destination is using “GCS Staging”, you can reuse the same staging bucket. - 2. Do not enable Destinations V2 on your previous / existing BigQuery destination during early release. It will cause your existing connections to fail. +2. **Enabling Destinations V2**: Create a new BigQuery destination, and enable the Destinations V2 option under `Advanced` settings. You will need your BigQuery credentials for this step. For this early release, we ask that you enable Destinations V2 on a new BigQuery destination using new connections. When Destinations V2 is fully available, there will be additional migration paths for upgrading your destination without resetting any of your existing connections. + 1. If your previous BigQuery destination is using “GCS Staging”, you can reuse the same staging bucket. + 2. Do not enable Destinations V2 on your previous / existing BigQuery destination during early release. It will cause your existing connections to fail. 3. **Create a New Connection**: Create connections using the new BigQuery destination. These will automatically use Destinations V2. - 1. If your new destination has the same default namespace, you may want to add a stream prefix to avoid collisions in the final tables. - 2. Do not modify the ‘Transformation’ settings. These will be ignored. + 1. If your new destination has the same default namespace, you may want to add a stream prefix to avoid collisions in the final tables. + 2. Do not modify the ‘Transformation’ settings. These will be ignored. 4. **Monitor your Sync**: Wait at least 20 minutes, or until your sync is complete. Verify the data in your destination is correct. Congratulations, you have successfully upgraded your connection to Destinations V2! Once you’ve completed the setup for Destinations V2, we ask that you pay special attention to the data delivered in your destination. Let us know immediately if you see any unexpected data: table and column name changes, missing columns, or columns with incorrect types. From 9ce201bc73eeff34bbf6b079c9c6c9454fa70aa9 Mon Sep 17 00:00:00 2001 From: evantahler Date: Fri, 11 Aug 2023 14:37:58 -0700 Subject: [PATCH 02/17] docs: `_airbyte_meta` Errors --- docs/understanding-airbyte/typing-deduping.md | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index 005dab61e990..5399696e7b75 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -1,18 +1,33 @@ # Typing and Deduping -This page refers to new functionality currently available in **early access**. Typing and deduping will become the new default method of transforming datasets within data warehouse and database destinations after they've been replicated. This functionality is going live with [Destinations V2](https://github.com/airbytehq/airbyte/issues/26028), which is now in early access for BigQuery. +This page refers to new functionality currently available in **early access**. Typing and deduping will become the new default method of transforming datasets within data warehouse and database destinations after they've been replicated. This functionality is going live with [Destinations V2](/release_notes/upgrading_to_destinations_v2/), which is now in early access for BigQuery. You will eventually be required to upgrade your connections to use the new destination versions. We are building tools for you to copy your connector’s configuration to a new version to make testing new destinations easier. These will be available in the next few weeks. ## What is Destinations V2? -At launch, Airbyte Destinations V2 will provide: +At launch, [Airbyte Destinations V2](/release_notes/upgrading_to_destinations_v2) will provide: - One-to-one table mapping: Data in one stream will always be mapped to one table in your data warehouse. No more sub-tables. - Improved per-row error handling with `_airbyte_meta`: Airbyte will now populate typing errors in the `_airbyte_meta` column instead of failing your sync. You can query these results to audit misformatted or unexpected data. - Internal Airbyte tables in the `airbyte_internal` schema: Airbyte will now generate all raw tables in the `airbyte_internal` schema. We no longer clutter your desired schema with raw data tables. - Incremental delivery for large syncs: Data will be incrementally delivered to your final tables when possible. No more waiting hours to see the first rows in your destination table. +## `_airbyte_meta` Errors + +"Per-row error handling" is a new paradigm for Airbyte which provides greater flexibility for our users. as We have now separated `data-moving errors` from `data-quality errors`. Prior to Destinations V2, both types of errors were handled the same way: by failing the sync. Now, a failing sync meas that Airbyte could not _move_ all of your data, and you can query the `_airbyte_meta` column to see which rows failed for _content_ reasons, and why. This is a more flexible approach, as you can now decide how to handle rows with errors on a case-by-case basis. + +:::tip +When using a V2 destination for most use cases, it is recommended that you include only rows which do not have an error, e.g `SELECT COUNT(*) FROM _table_ WHERE json_array_length(_airbyte_meta ->> errors) = 0` (postgres syntax). +::: + +The types of errors which will be stored in `_airbyte_meta.errors` include: + +- **Typing errors**: the source declared that the type of the column `id` should be an integer, but a string value was returned +- **Size errors**: the source returned content which cannot be stored within this this row or column (e.g. [a Redshift Super column has a 16mb limit](https://docs.aws.amazon.com/redshift/latest/dg/limitations-super.html)) + +That said, depending on your use-case, it may still be valuable to consider rows with errors, especially in aggregate. For example, you may have a table `user_reviews`, and you might ask how many new reviews you received today. Regardless of if your datawarehouse had trouble storing the full contents in the `message` column or not, `SELECT COUNT(*) from user_reviews WHERE DATE(created_at) = DATE(NOW())` is still valid. + ## Destinations V2 Example Consider the following [source schema](https://docs.airbyte.com/integrations/sources/faker) for stream `users`: From e0e90fe10bfe6eedf023a59436d6c8c536bef229 Mon Sep 17 00:00:00 2001 From: evantahler Date: Fri, 11 Aug 2023 14:38:17 -0700 Subject: [PATCH 03/17] link --- docs/release_notes/upgrading_to_destinations_v2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/release_notes/upgrading_to_destinations_v2.md b/docs/release_notes/upgrading_to_destinations_v2.md index 3d641302429e..700389190a9a 100644 --- a/docs/release_notes/upgrading_to_destinations_v2.md +++ b/docs/release_notes/upgrading_to_destinations_v2.md @@ -13,7 +13,7 @@ Starting today, Airbyte Destinations V2 provides you with: - Internal Airbyte tables in the `airbyte_internal` schema: Airbyte will now generate all raw tables in the `airbyte_internal` schema. We no longer clutter your destination schema with raw data tables. - Incremental delivery for large syncs: Data will be incrementally delivered to your final tables. No more waiting hours to see the first rows in your destination table. -To see more details and examples on the contents of the Destinations V2 release, see this [guide](../understanding-airbyte/typing-deduping.md). The remainder of this page will walk you through upgrading connectors from legacy normalization to Destinations V2. +To see more details and examples on the contents of the Destinations V2 release, see this [guide](/understanding-airbyte/typing-deduping.md). The remainder of this page will walk you through upgrading connectors from legacy normalization to Destinations V2. ## Deprecating Legacy Normalization From d09e11a6e48bbb01e2c5ab706f0f5a2992272a1e Mon Sep 17 00:00:00 2001 From: evantahler Date: Fri, 11 Aug 2023 14:40:32 -0700 Subject: [PATCH 04/17] format --- docs/understanding-airbyte/typing-deduping.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index 5399696e7b75..8122a4068877 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -18,7 +18,13 @@ At launch, [Airbyte Destinations V2](/release_notes/upgrading_to_destinations_v2 "Per-row error handling" is a new paradigm for Airbyte which provides greater flexibility for our users. as We have now separated `data-moving errors` from `data-quality errors`. Prior to Destinations V2, both types of errors were handled the same way: by failing the sync. Now, a failing sync meas that Airbyte could not _move_ all of your data, and you can query the `_airbyte_meta` column to see which rows failed for _content_ reasons, and why. This is a more flexible approach, as you can now decide how to handle rows with errors on a case-by-case basis. :::tip -When using a V2 destination for most use cases, it is recommended that you include only rows which do not have an error, e.g `SELECT COUNT(*) FROM _table_ WHERE json_array_length(_airbyte_meta ->> errors) = 0` (postgres syntax). +When using a V2 destination for most use cases, it is recommended that you include only rows which do not have an error, e.g: + +```sql +-- postgres syntax +SELECT COUNT(\*) FROM _table_ WHERE json_array_length(\_airbyte_meta ->> errors) = 0 +``` + ::: The types of errors which will be stored in `_airbyte_meta.errors` include: From a1695de2e3c80ec49ed170a664f59062bcd2bbc8 Mon Sep 17 00:00:00 2001 From: evantahler Date: Fri, 11 Aug 2023 14:43:10 -0700 Subject: [PATCH 05/17] paste from master --- docs/release_notes/upgrading_to_destinations_v2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/release_notes/upgrading_to_destinations_v2.md b/docs/release_notes/upgrading_to_destinations_v2.md index 700389190a9a..c82ffd2aff13 100644 --- a/docs/release_notes/upgrading_to_destinations_v2.md +++ b/docs/release_notes/upgrading_to_destinations_v2.md @@ -13,7 +13,7 @@ Starting today, Airbyte Destinations V2 provides you with: - Internal Airbyte tables in the `airbyte_internal` schema: Airbyte will now generate all raw tables in the `airbyte_internal` schema. We no longer clutter your destination schema with raw data tables. - Incremental delivery for large syncs: Data will be incrementally delivered to your final tables. No more waiting hours to see the first rows in your destination table. -To see more details and examples on the contents of the Destinations V2 release, see this [guide](/understanding-airbyte/typing-deduping.md). The remainder of this page will walk you through upgrading connectors from legacy normalization to Destinations V2. +To see more details and examples on the contents of the Destinations V2 release, see this [guide](understanding-airbyte/typing-deduping.md). The remainder of this page will walk you through upgrading connectors from legacy normalization to Destinations V2. ## Deprecating Legacy Normalization From c892672e33e3e8fdddd980b0e55ec730d4528565 Mon Sep 17 00:00:00 2001 From: evantahler Date: Fri, 11 Aug 2023 14:46:55 -0700 Subject: [PATCH 06/17] fix toc madness --- .../upgrading_to_destinations_v2.md | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/docs/release_notes/upgrading_to_destinations_v2.md b/docs/release_notes/upgrading_to_destinations_v2.md index c82ffd2aff13..ca6733778838 100644 --- a/docs/release_notes/upgrading_to_destinations_v2.md +++ b/docs/release_notes/upgrading_to_destinations_v2.md @@ -56,19 +56,11 @@ Each destination version is managed separately, so if you have multiple destinat Versions are tied to the destination. When you update the destination, **all connections tied to that destination will be sending data in the Destinations V2 format**. For upgrade paths that will minimize disruption to existing dashboards, see: -- [Upgrading to Destinations V2](#upgrading-to-destinations-v2) - - [What is Destinations V2?](#what-is-destinations-v2) - - [Deprecating Legacy Normalization](#deprecating-legacy-normalization) - - [Breakdown of Breaking Changes](#breakdown-of-breaking-changes) - - [Quick Start to Upgrading](#quick-start-to-upgrading) - - [Advanced Upgrade Paths](#advanced-upgrade-paths) - - [Upgrading Connections One by One with Dual-Writing](#upgrading-connections-one-by-one-with-dual-writing) - - [Steps to Follow for All Sync Modes](#steps-to-follow-for-all-sync-modes) - - [Additional Steps for Incremental Sync Modes](#additional-steps-for-incremental-sync-modes) - - [Testing Destinations V2 for a Single Connection](#testing-destinations-v2-for-a-single-connection) - - [Upgrading as a User of Raw Tables](#upgrading-as-a-user-of-raw-tables) - - [Upgrade Paths for Connections using CDC](#upgrade-paths-for-connections-using-cdc) - - [Destinations V2 Compatible Versions](#destinations-v2-compatible-versions) +- [Upgrading Connections One by One with Dual-Writing](#upgrading-connections-one-by-one-with-dual-writing) +- [Testing Destinations V2 on a Single Connection](#testing-destinations-v2-for-a-single-connection) +- [Upgrading Connections One by One Using CDC](#upgrade-paths-for-connections-using-cdc) +- [Upgrading as a User of Raw Tables](#upgrading-as-a-user-of-raw-tables) +- [Rolling back to Legacy Normalization](#oss-only-rolling-back-to-legacy-normalization) ## Advanced Upgrade Paths From 6c3aa88851d1ab511270110da2b1ab9242f49ff3 Mon Sep 17 00:00:00 2001 From: evantahler Date: Fri, 11 Aug 2023 14:51:29 -0700 Subject: [PATCH 07/17] fix sql --- docs/understanding-airbyte/typing-deduping.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index 8122a4068877..1dd264505505 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -22,7 +22,7 @@ When using a V2 destination for most use cases, it is recommended that you inclu ```sql -- postgres syntax -SELECT COUNT(\*) FROM _table_ WHERE json_array_length(\_airbyte_meta ->> errors) = 0 +SELECT COUNT(*) FROM _table_ WHERE json_array_length(\_airbyte_meta ->> errors) = 0 ``` ::: From 8fc5b3490d1171e49dba27c3db792880557302ed Mon Sep 17 00:00:00 2001 From: evantahler Date: Fri, 11 Aug 2023 14:51:47 -0700 Subject: [PATCH 08/17] fix sql --- docs/understanding-airbyte/typing-deduping.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index 1dd264505505..f69cb8350c7d 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -22,7 +22,7 @@ When using a V2 destination for most use cases, it is recommended that you inclu ```sql -- postgres syntax -SELECT COUNT(*) FROM _table_ WHERE json_array_length(\_airbyte_meta ->> errors) = 0 +SELECT COUNT(*) FROM _table_ WHERE json_array_length(_airbyte_meta ->> errors) = 0 ``` ::: From 04eaa0c4dc91b31782e347dd347cb7a070c82a4d Mon Sep 17 00:00:00 2001 From: Evan Tahler Date: Fri, 11 Aug 2023 17:15:58 -0700 Subject: [PATCH 09/17] Update docs/understanding-airbyte/typing-deduping.md Co-authored-by: Alexandre Cuoci --- docs/understanding-airbyte/typing-deduping.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index f69cb8350c7d..3e2405aa284a 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -81,7 +81,7 @@ You can start using Destinations V2 for BigQuery in early access by following th 1. **Upgrade your BigQuery Destination**: If you are using Airbyte Open Source, update your BigQuery destination version to the latest version. If you are a Cloud customer, this step will already be completed on your behalf. 2. **Enabling Destinations V2**: Create a new BigQuery destination, and enable the Destinations V2 option under `Advanced` settings. You will need your BigQuery credentials for this step. For this early release, we ask that you enable Destinations V2 on a new BigQuery destination using new connections. When Destinations V2 is fully available, there will be additional migration paths for upgrading your destination without resetting any of your existing connections. 1. If your previous BigQuery destination is using “GCS Staging”, you can reuse the same staging bucket. - 2. Do not enable Destinations V2 on your previous / existing BigQuery destination during early release. It will cause your existing connections to fail. + 2. Do not enable Destinations V2 on your previous / existing destinations during early release. It will cause your existing connections to fail. 3. **Create a New Connection**: Create connections using the new BigQuery destination. These will automatically use Destinations V2. 1. If your new destination has the same default namespace, you may want to add a stream prefix to avoid collisions in the final tables. 2. Do not modify the ‘Transformation’ settings. These will be ignored. From 46e31bd7c35e19f25c6a2a07ab9497f259c7b176 Mon Sep 17 00:00:00 2001 From: Evan Tahler Date: Fri, 11 Aug 2023 17:16:10 -0700 Subject: [PATCH 10/17] Update docs/understanding-airbyte/typing-deduping.md Co-authored-by: Alexandre Cuoci --- docs/understanding-airbyte/typing-deduping.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index 3e2405aa284a..711120898501 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -79,7 +79,7 @@ You also now see the following changes in Airbyte-provided columns: You can start using Destinations V2 for BigQuery in early access by following the below instructions: 1. **Upgrade your BigQuery Destination**: If you are using Airbyte Open Source, update your BigQuery destination version to the latest version. If you are a Cloud customer, this step will already be completed on your behalf. -2. **Enabling Destinations V2**: Create a new BigQuery destination, and enable the Destinations V2 option under `Advanced` settings. You will need your BigQuery credentials for this step. For this early release, we ask that you enable Destinations V2 on a new BigQuery destination using new connections. When Destinations V2 is fully available, there will be additional migration paths for upgrading your destination without resetting any of your existing connections. +2. **Enabling Destinations V2**: Create a new destination, and enable the Destinations V2 option under `Advanced` settings. You will need your data warehouse credentials for this step. For this early release, we ask that you enable Destinations V2 on a new destination using new connections. When Destinations V2 is fully available, there will be additional migration paths for upgrading your destination without resetting any of your existing connections. 1. If your previous BigQuery destination is using “GCS Staging”, you can reuse the same staging bucket. 2. Do not enable Destinations V2 on your previous / existing destinations during early release. It will cause your existing connections to fail. 3. **Create a New Connection**: Create connections using the new BigQuery destination. These will automatically use Destinations V2. From acc9ebc4546da3b6835571e7cb99863c15def449 Mon Sep 17 00:00:00 2001 From: Evan Tahler Date: Fri, 11 Aug 2023 17:16:19 -0700 Subject: [PATCH 11/17] Update docs/understanding-airbyte/typing-deduping.md Co-authored-by: Alexandre Cuoci --- docs/understanding-airbyte/typing-deduping.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index 711120898501..297c976871be 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -78,7 +78,7 @@ You also now see the following changes in Airbyte-provided columns: You can start using Destinations V2 for BigQuery in early access by following the below instructions: -1. **Upgrade your BigQuery Destination**: If you are using Airbyte Open Source, update your BigQuery destination version to the latest version. If you are a Cloud customer, this step will already be completed on your behalf. +1. **Upgrade your Destination**: If you are using Airbyte Open Source, update your destination version to the latest version. If you are a Cloud customer, this step will already be completed on your behalf. 2. **Enabling Destinations V2**: Create a new destination, and enable the Destinations V2 option under `Advanced` settings. You will need your data warehouse credentials for this step. For this early release, we ask that you enable Destinations V2 on a new destination using new connections. When Destinations V2 is fully available, there will be additional migration paths for upgrading your destination without resetting any of your existing connections. 1. If your previous BigQuery destination is using “GCS Staging”, you can reuse the same staging bucket. 2. Do not enable Destinations V2 on your previous / existing destinations during early release. It will cause your existing connections to fail. From e77f815e1cf3459c2cc1f459d5539996b8822f4c Mon Sep 17 00:00:00 2001 From: Evan Tahler Date: Fri, 11 Aug 2023 17:16:27 -0700 Subject: [PATCH 12/17] Update docs/understanding-airbyte/typing-deduping.md Co-authored-by: Alexandre Cuoci --- docs/understanding-airbyte/typing-deduping.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index 297c976871be..e0bd79c5db55 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -76,7 +76,7 @@ You also now see the following changes in Airbyte-provided columns: ## Participating in Early Access -You can start using Destinations V2 for BigQuery in early access by following the below instructions: +You can start using Destinations V2 for BigQuery or Snowflake in early access by following the below instructions: 1. **Upgrade your Destination**: If you are using Airbyte Open Source, update your destination version to the latest version. If you are a Cloud customer, this step will already be completed on your behalf. 2. **Enabling Destinations V2**: Create a new destination, and enable the Destinations V2 option under `Advanced` settings. You will need your data warehouse credentials for this step. For this early release, we ask that you enable Destinations V2 on a new destination using new connections. When Destinations V2 is fully available, there will be additional migration paths for upgrading your destination without resetting any of your existing connections. From abcb2081d178af85fa3a08ef16b925b462c36265 Mon Sep 17 00:00:00 2001 From: Evan Tahler Date: Fri, 11 Aug 2023 17:17:08 -0700 Subject: [PATCH 13/17] Update docs/understanding-airbyte/typing-deduping.md Co-authored-by: Alexandre Cuoci --- docs/understanding-airbyte/typing-deduping.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index e0bd79c5db55..9ea54ed4c6fe 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -30,7 +30,7 @@ SELECT COUNT(*) FROM _table_ WHERE json_array_length(_airbyte_meta ->> errors) = The types of errors which will be stored in `_airbyte_meta.errors` include: - **Typing errors**: the source declared that the type of the column `id` should be an integer, but a string value was returned -- **Size errors**: the source returned content which cannot be stored within this this row or column (e.g. [a Redshift Super column has a 16mb limit](https://docs.aws.amazon.com/redshift/latest/dg/limitations-super.html)) +- **Size errors**: the source returned content which cannot be stored within this this row or column (e.g. [a Redshift Super column has a 16mb limit](https://docs.aws.amazon.com/redshift/latest/dg/limitations-super.html)). That said, depending on your use-case, it may still be valuable to consider rows with errors, especially in aggregate. For example, you may have a table `user_reviews`, and you might ask how many new reviews you received today. Regardless of if your datawarehouse had trouble storing the full contents in the `message` column or not, `SELECT COUNT(*) from user_reviews WHERE DATE(created_at) = DATE(NOW())` is still valid. From 73b10ad14e208b6049132bf4a42e4afe569bd840 Mon Sep 17 00:00:00 2001 From: Evan Tahler Date: Fri, 11 Aug 2023 17:17:15 -0700 Subject: [PATCH 14/17] Update docs/understanding-airbyte/typing-deduping.md Co-authored-by: Alexandre Cuoci --- docs/understanding-airbyte/typing-deduping.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index 9ea54ed4c6fe..2b6fb677ded3 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -29,7 +29,7 @@ SELECT COUNT(*) FROM _table_ WHERE json_array_length(_airbyte_meta ->> errors) = The types of errors which will be stored in `_airbyte_meta.errors` include: -- **Typing errors**: the source declared that the type of the column `id` should be an integer, but a string value was returned +- **Typing errors**: the source declared that the type of the column `id` should be an integer, but a string value was returned. - **Size errors**: the source returned content which cannot be stored within this this row or column (e.g. [a Redshift Super column has a 16mb limit](https://docs.aws.amazon.com/redshift/latest/dg/limitations-super.html)). That said, depending on your use-case, it may still be valuable to consider rows with errors, especially in aggregate. For example, you may have a table `user_reviews`, and you might ask how many new reviews you received today. Regardless of if your datawarehouse had trouble storing the full contents in the `message` column or not, `SELECT COUNT(*) from user_reviews WHERE DATE(created_at) = DATE(NOW())` is still valid. From 71392a71f00ba153757c59584ced4be4074dd31f Mon Sep 17 00:00:00 2001 From: Evan Tahler Date: Fri, 11 Aug 2023 17:17:35 -0700 Subject: [PATCH 15/17] Update docs/understanding-airbyte/typing-deduping.md Co-authored-by: Alexandre Cuoci --- docs/understanding-airbyte/typing-deduping.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index 2b6fb677ded3..e348f351a980 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -18,7 +18,7 @@ At launch, [Airbyte Destinations V2](/release_notes/upgrading_to_destinations_v2 "Per-row error handling" is a new paradigm for Airbyte which provides greater flexibility for our users. as We have now separated `data-moving errors` from `data-quality errors`. Prior to Destinations V2, both types of errors were handled the same way: by failing the sync. Now, a failing sync meas that Airbyte could not _move_ all of your data, and you can query the `_airbyte_meta` column to see which rows failed for _content_ reasons, and why. This is a more flexible approach, as you can now decide how to handle rows with errors on a case-by-case basis. :::tip -When using a V2 destination for most use cases, it is recommended that you include only rows which do not have an error, e.g: +When using data downstream from Airbyte, we generally recommend you only include rows which do not have an error, e.g: ```sql -- postgres syntax From 9267b63f181e44c9da81d9157d377a3db9c9b585 Mon Sep 17 00:00:00 2001 From: Evan Tahler Date: Sat, 12 Aug 2023 00:24:20 +0000 Subject: [PATCH 16/17] Alex's nits --- docs/understanding-airbyte/typing-deduping.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index e348f351a980..7360adbf650b 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -15,7 +15,7 @@ At launch, [Airbyte Destinations V2](/release_notes/upgrading_to_destinations_v2 ## `_airbyte_meta` Errors -"Per-row error handling" is a new paradigm for Airbyte which provides greater flexibility for our users. as We have now separated `data-moving errors` from `data-quality errors`. Prior to Destinations V2, both types of errors were handled the same way: by failing the sync. Now, a failing sync meas that Airbyte could not _move_ all of your data, and you can query the `_airbyte_meta` column to see which rows failed for _content_ reasons, and why. This is a more flexible approach, as you can now decide how to handle rows with errors on a case-by-case basis. +"Per-row error handling" is a new paradigm for Airbyte which provides greater flexibility for our users. Airbyte now separates `data-moving problems` from `data-content problems`. Prior to Destinations V2, both types of errors were handled the same way: by failing the sync. Now, a failing sync meas that Airbyte could not _move_ all of your data, and you can query the `_airbyte_meta` column to see which rows failed for _content_ reasons, and why. This is a more flexible approach, as you can now decide how to handle rows with errors on a case-by-case basis. :::tip When using data downstream from Airbyte, we generally recommend you only include rows which do not have an error, e.g: @@ -32,7 +32,7 @@ The types of errors which will be stored in `_airbyte_meta.errors` include: - **Typing errors**: the source declared that the type of the column `id` should be an integer, but a string value was returned. - **Size errors**: the source returned content which cannot be stored within this this row or column (e.g. [a Redshift Super column has a 16mb limit](https://docs.aws.amazon.com/redshift/latest/dg/limitations-super.html)). -That said, depending on your use-case, it may still be valuable to consider rows with errors, especially in aggregate. For example, you may have a table `user_reviews`, and you might ask how many new reviews you received today. Regardless of if your datawarehouse had trouble storing the full contents in the `message` column or not, `SELECT COUNT(*) from user_reviews WHERE DATE(created_at) = DATE(NOW())` is still valid. +Depending on your use-case, it may still be valuable to consider rows with errors, especially for aggregations. For example, you may have a table `user_reviews`, and you would like to know the count of new reviews received today. You can choose to include reviews regardless of whether your data warehouse had difficulty storing the full contents of the `message` column. For this use case, `SELECT COUNT(*) from user_reviews WHERE DATE(created_at) = DATE(NOW())` is still valid. ## Destinations V2 Example From 77a68e64b57a92b045c8c38ba18b55e408dcd207 Mon Sep 17 00:00:00 2001 From: Alexandre Cuoci Date: Mon, 14 Aug 2023 20:59:43 -0400 Subject: [PATCH 17/17] Update docs/understanding-airbyte/typing-deduping.md --- docs/understanding-airbyte/typing-deduping.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/typing-deduping.md b/docs/understanding-airbyte/typing-deduping.md index 7360adbf650b..257bca566884 100644 --- a/docs/understanding-airbyte/typing-deduping.md +++ b/docs/understanding-airbyte/typing-deduping.md @@ -15,7 +15,7 @@ At launch, [Airbyte Destinations V2](/release_notes/upgrading_to_destinations_v2 ## `_airbyte_meta` Errors -"Per-row error handling" is a new paradigm for Airbyte which provides greater flexibility for our users. Airbyte now separates `data-moving problems` from `data-content problems`. Prior to Destinations V2, both types of errors were handled the same way: by failing the sync. Now, a failing sync meas that Airbyte could not _move_ all of your data, and you can query the `_airbyte_meta` column to see which rows failed for _content_ reasons, and why. This is a more flexible approach, as you can now decide how to handle rows with errors on a case-by-case basis. +"Per-row error handling" is a new paradigm for Airbyte which provides greater flexibility for our users. Airbyte now separates `data-moving problems` from `data-content problems`. Prior to Destinations V2, both types of errors were handled the same way: by failing the sync. Now, a failing sync means that Airbyte could not _move_ all of your data. You can query the `_airbyte_meta` column to see which rows failed for _content_ reasons, and why. This is a more flexible approach, as you can now decide how to handle rows with errors on a case-by-case basis. :::tip When using data downstream from Airbyte, we generally recommend you only include rows which do not have an error, e.g: