You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we create all connectors and init all clients at the very beginning of connector creation. But it seems that google sdk big query connector may fail (with 404 response code) if we create it and wait for a long time (ex. 12 hours) while the source connector is reading data to migrate.
Bulk loading (https://cloud.google.com/bigquery/docs/batch-loading-data) would allow us to stage the data on GCS entirely before loading it into bigquery. This is already implemented in Snowflake so we'd need something similar implemented for bigquery.
Expected Behavior
HelpFiles.zip
The connector should use Bulk upload type
Logs
See attached archive
ToDo as part of this ticket
Update bigquery connector to use a Bulk loading (collect data to GCS and then onClose move it to bigQuery using Bulk loadong)
Good to be also done:
Use the "destination bigquery creds" from lastpass to get a secret for testing
Run the performance test from attached archive to make sure that application doesn't return 404 anymore. (Create a new destination connector -> Wait for 12 + hours -> try write some message and stop container -> check that message appears on the cloud). For more details you may also check last comments from Destination Bigquery returns 404 when uploading data + resumable #3549
Notes:
Some scoping required:
We already have some destination-gcs - how this can be re-used?
Probably snowflake destination's implementation may be also useful to check before start working on this one
The text was updated successfully, but these errors were encountered:
etsybaev
changed the title
Destination Bigquery: rewrite connector to use Bulk loading upload instead of current one
Destination Bigquery: rewrite connector to use Bulk upload instead of current one
Aug 10, 2021
Hi @sherifnada . I've created this follow-up ticket (from #3549) as you proposed.
Could you please provide a little bit more information about your expectation and acceptance criteria.
Just to make sure I understood you correctly, it would use a GCS (like in snowflake) - then customer would also create some storage and provide us with creds for it, right?
I've also found this ticket (#4745) for snowflake and GCS that hasn't been solved for a while. So wouldn't we get a same issue here then?
Or maybe I just didn't get something. Many thanks in advance!
@etsybaev this should work pretty much the same way as it does in Snowflake. The idea is to stage the data on GCS first, changing whatever is needed about the input to the connector to make that happen.
However, contrary to what the title of this ticket might imply, we should not rewrite the connector -- users should still be able to use the current INSERT mechanism (it's easier for PoCs) where possible
Current Behavior
Currently, we create all connectors and init all clients at the very beginning of connector creation. But it seems that google sdk big query connector may fail (with 404 response code) if we create it and wait for a long time (ex. 12 hours) while the source connector is reading data to migrate.
Bulk loading (https://cloud.google.com/bigquery/docs/batch-loading-data) would allow us to stage the data on GCS entirely before loading it into bigquery. This is already implemented in Snowflake so we'd need something similar implemented for bigquery.
Expected Behavior
HelpFiles.zip
The connector should use Bulk upload type
Logs
See attached archive
ToDo as part of this ticket
Good to be also done:
Notes:
Some scoping required:
We already have some destination-gcs - how this can be re-used?
Probably snowflake destination's implementation may be also useful to check before start working on this one
The text was updated successfully, but these errors were encountered: