Destination Bigquery: rewrite connector to use Bulk upload instead of current one #5296

etsybaev · 2021-08-10T07:18:12Z

Current Behavior

Currently, we create all connectors and init all clients at the very beginning of connector creation. But it seems that google sdk big query connector may fail (with 404 response code) if we create it and wait for a long time (ex. 12 hours) while the source connector is reading data to migrate.
Bulk loading (https://cloud.google.com/bigquery/docs/batch-loading-data) would allow us to stage the data on GCS entirely before loading it into bigquery. This is already implemented in Snowflake so we'd need something similar implemented for bigquery.

Expected Behavior

HelpFiles.zip
The connector should use Bulk upload type

Logs

See attached archive

ToDo as part of this ticket

Update bigquery connector to use a Bulk loading (collect data to GCS and then onClose move it to bigQuery using Bulk loadong)
Good to be also done:
Use the "destination bigquery creds" from lastpass to get a secret for testing
Run the performance test from attached archive to make sure that application doesn't return 404 anymore. (Create a new destination connector -> Wait for 12 + hours -> try write some message and stop container -> check that message appears on the cloud). For more details you may also check last comments from Destination Bigquery returns 404 when uploading data + resumable #3549

Notes:
Some scoping required:
We already have some destination-gcs - how this can be re-used?
Probably snowflake destination's implementation may be also useful to check before start working on this one

etsybaev · 2021-08-15T13:56:43Z

Hi @sherifnada . I've created this follow-up ticket (from #3549) as you proposed.
Could you please provide a little bit more information about your expectation and acceptance criteria.
Just to make sure I understood you correctly, it would use a GCS (like in snowflake) - then customer would also create some storage and provide us with creds for it, right?

I've also found this ticket (#4745) for snowflake and GCS that hasn't been solved for a while. So wouldn't we get a same issue here then?
Or maybe I just didn't get something. Many thanks in advance!

sherifnada · 2021-08-16T00:36:53Z

@etsybaev this should work pretty much the same way as it does in Snowflake. The idea is to stage the data on GCS first, changing whatever is needed about the input to the connector to make that happen.

However, contrary to what the title of this ticket might imply, we should not rewrite the connector -- users should still be able to use the current INSERT mechanism (it's easier for PoCs) where possible

etsybaev added the type/bug Something isn't working label Aug 10, 2021

etsybaev changed the title ~~Destination Bigquery: rewrite connector to use Bulk loading upload instead of current one~~ Destination Bigquery: rewrite connector to use Bulk upload instead of current one Aug 10, 2021

etsybaev self-assigned this Aug 17, 2021

etsybaev linked a pull request Aug 25, 2021 that will close this issue

🎉 Destination Bigquery: added gcs upload option #5614

Merged

36 tasks

etsybaev closed this as completed in #5614 Sep 8, 2021

etsybaev mentioned this issue Sep 8, 2021

🐛 Destination BigQuery Denormalized: Fixed compilation error #5917

Merged

38 tasks

karinakuz added connectors/destination/bigquery connectors/destinations-api labels Jan 12, 2022

sherifnada added this to GL Roadmap Jan 12, 2022

sherifnada moved this to Done in GL Roadmap Jan 12, 2022

karinakuz added connectors/destinations-warehouse and removed connectors/destinations-api labels Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Destination Bigquery: rewrite connector to use Bulk upload instead of current one #5296

Destination Bigquery: rewrite connector to use Bulk upload instead of current one #5296

etsybaev commented Aug 10, 2021 •

edited

Loading

etsybaev commented Aug 15, 2021

sherifnada commented Aug 16, 2021

Destination Bigquery: rewrite connector to use Bulk upload instead of current one #5296

Destination Bigquery: rewrite connector to use Bulk upload instead of current one #5296

Comments

etsybaev commented Aug 10, 2021 • edited Loading

Current Behavior

Expected Behavior

Logs

ToDo as part of this ticket

etsybaev commented Aug 15, 2021

sherifnada commented Aug 16, 2021

etsybaev commented Aug 10, 2021 •

edited

Loading