Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] Airbyte Destinations V2: Improved Data Syncing & Error Handling #26028

Closed
Hesperide opened this issue May 12, 2023 · 33 comments
Closed
Assignees
Labels
community roadmap team/destinations Destinations team's backlog

Comments

@Hesperide
Copy link
Contributor

Hesperide commented May 12, 2023

Introducing Airbyte Destinations V2: Improved Data Syncing & Error Handling

To participate in our beta program and share your thoughts on your desired migration path, please fill out this form.

We're excited to announce an upcoming change to Airbyte that will significantly improve the way data is synced and handled in destination tables (currently known as normalization). We are introducing Airbyte Destinations V2, and we will need your feedback and help to beta test this new feature before it becomes widely available.

The main changes in the first release of Airbyte Destinations V2 will be:

  • One-to-one mapping: Data from one stream (endpoint or table) will now create one table in the destination, making it simpler and more efficient.
  • Improved error handling: Typing errors will no longer fail your sync, ensuring smoother data integration processes.
  • Auditable typing errors: Typing errors will now be easily visible in a new _airbyte_meta column, allowing for better tracking of inconsistencies and resolution of issues.

Please note that these improvements will mean breaking changes to your destination tables. This will be most notable to users actively syncing data from API sources such as Facebook Marketing, HubSpot, Stripe, Amazon Ads, TikTok Marketing and more.


Destinations V2 Example

Consider the following source schema for stream users:

{
  "id": "number",
  "first_name": "string",
  "age": "number",
  "address": {
    "city": "string",
    "zip": "string"
  }
}

The data from one stream will now be mapped to one table in your schema as below. Highlights:

  • Improved error handling with _airbyte_meta: Airbyte will populate typing errors in the _airbyte_meta column instead of failing your sync. You can query these results to audit misformatted or unexpected data.
  • Internal Airbyte tables in the airbyte schema: Airbyte will now generate all raw tables in the airbyte schema. You can use these tables to investigate raw data, but please note the format of the tables in airbyte may change at any time.

Destination Table Name: public.users

(note, not in actual table) _airbyte_raw_id _airbyte_extracted_at _airbyte_meta id first_name age address
Successful typing and de-duping ⟶ xxx-xxx-xxx 2022-01-01 12:00:00 {} 1 sarah 39 { city: “San Francisco”, zip: “94131” }
Failed typing that didn’t break other rows ⟶ yyy-yyy-yyy 2022-01-01 12:00:00 { errors: { age: “fish” is not a valid integer for column “age” }} 2 evan NULL { city: “Menlo Park”, zip: “94002” }
Not-yet-typed ⟶

Destination Table Name: airbyte_internal.public_raw__stream_users (airbyte_internal.{namespace}_stream_..._raw_{stream})

(note, not in actual table) _airbyte_raw_id _airbyte_data _airbyte_loaded_at _airbyte_extracted_at
Successful typing and de-duping ⟶ xxx-xxx-xxx { id: 1, first_name: “sarah”, age: 39, address: { city: “San Francisco”, zip: “94131” } } 2022-01-01 12:00:001 2022-01-01 12:00:00
Failed typing that didn’t break other rows ⟶ yyy-yyy-yyy { id: 2, first_name: “evan”, age: “fish”, address: { city: “Menlo Park”, zip: “94002” } } 2022-01-01 12:00:001 2022-01-01 12:00:00
Not-yet-typed ⟶ zzz-zzz-zzz { id: 3, first_name: “edward”, age: 35, address: { city: “Sunnyvale”, zip: “94003” } } NULL 2022-01-01 13:00:00

As there will be breaking changes with Airbyte Destinations V2, we are looking for beta customers to help us try out these new features and provide valuable feedback. We are also seeking input on your preferences for a migration plan to ensure a smooth transition.

Your input is crucial in helping us make Airbyte better for everyone. Thank you for your continued support, and we look forward to hearing your feedback on Airbyte Destinations V2!

For additional information and context on this announcement, see the previous issue on this topic: #25194

@alfii-joey
Copy link

Will Destination V2 remove the SCD feature?

@evantahler
Copy link
Contributor

Will Destination V2 remove the SCD feature?

Yes, we are removing SCD tables, as described in #25194.

@evantahler
Copy link
Contributor

Thank you for all these reports, @haithem-souala! We'll check in with you in each issue

@haithem-souala
Copy link
Contributor

Hey @evantahler, could you check this one: #29172

@honggyu-rr
Copy link

honggyu-rr commented Aug 15, 2023

How can we track updates on a release for this? It seems like some work has been done but don't see any links to PRs here

@evantahler
Copy link
Contributor

@honggyu-rr - you are in the right place! We will announce here and in our community slack when the next versions of the destination are ready. You can try out Destinations v2 for Snowflake and Bigquery today!

@pranasziaukas
Copy link

Support for the data warehouse you requested in the form will be available in the coming few weeks, and we'll be sure to reach out to you before then.

Hey @Hesperide. A month has passed, are there any updates on Redshift support?

We'd like to test and consider company-wide adoption but IMO at this point it only makes sense to do that using the next major version.

@evantahler
Copy link
Contributor

@pranasziaukas destination-redshift will get the Destinations V2 treatment in September.

@honggyu-rr
Copy link

@pranasziaukas destination-redshift will get the Destinations V2 treatment in September.

Will it also go into GA with v2?

@rtol5
Copy link

rtol5 commented Sep 4, 2023

I just followed the UI prompt to upgrade destination-snowflake to v2, and I'm now finding out that all table and schema names are case-sensitive now. This isn't mentioned anywhere in the upgrade guide (https://docs.airbyte.com/release_notes/upgrading_to_destinations_v2/), and Airbyte's UI appears to only let you rename schemas, not tables.

Is there a way for destination-snowflake to retain case insensitivity? Otherwise, this'd break every existing query and we'd love to avoid that.

@evantahler
Copy link
Contributor

evantahler commented Sep 5, 2023

desitination-snowflake @ v3.1.0 is back to be capital-letter (case-insensitive) only via #30056. Sorry about that!

@nachosan2
Copy link

Is there any update on Postgres for this? Thank you!

@evantahler
Copy link
Contributor

Look for Destination V2 for Postgres and MySQL later this year!

@Jordonkopp
Copy link

Jordonkopp commented Nov 14, 2023

Not sure if this is the correct place - I can open a separate issue if needed.

There seems to be a bug in Snowflake Destination 3.x.x + connector versions with V2. It creates the raw stream tables in the airbyte_internal schema, but the DBT code fails because its still looking for the V1 named _raw tables.

Any brand new connection created with > 3.x.x fails due to this.

Build info:

  • Deployed on Kubernetes
  • Version 0.50.1
  • Snowflake Destination 2.1.7 is highest we can go without this breaking change

Example Error:

SQL compilation error: Object '<Destination_DB>.<DESTINATION_SCHEMA>._AIRBYTE_RAW_<table_name>' does not exist or not authorized.

@evantahler
Copy link
Contributor

Please open a separate issue for this @Jordonkopp - dbt should not be running at all any more! Destinations V2 no longer relies on dbt to make the final tables.

@niranjanbala
Copy link

Does it still make those tables in destination DB? our general concern was the additional storage would cost on the owner of the destination DB.

Is it possible to keep this tables out of the destination DB altogother and flushed post sync completion?

@evantahler
Copy link
Contributor

Nope, @niranjanbala that's not how airbyte works. We rely on your datawarehouse for typecasting, error handling, and deduplication, because that's going to be the most performant and secure way to accomplish those tasks. That means we need some storage for the raw data, and the normalized data, both in the warehouse.

@evantahler
Copy link
Contributor

I'm going to close this issue. Destinations V2 is released for Snowflake and Bigquery today, and Redshift will be released shortly (with an open beta program). Postgres and MySQL will be released within Q1 2024. .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community roadmap team/destinations Destinations team's backlog
Projects
Status: Released before Apr 2024
Development

No branches or pull requests