Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination Bigquery returns 404 when uploading data + resumable #3549

Closed
marcosmarxm opened this issue May 22, 2021 · 26 comments · Fixed by #5029, #5049 or #5614
Closed

Destination Bigquery returns 404 when uploading data + resumable #3549

marcosmarxm opened this issue May 22, 2021 · 26 comments · Fixed by #5029, #5049 or #5614

Comments

@marcosmarxm
Copy link
Member

marcosmarxm commented May 22, 2021

Expected Behavior

Sync between Shopify and BQ works

Current Behavior

It's not totally clear why this happens...
BigQuery returns 404 when uploading data after running a few hours.

From Google Cloud API Page here

Handle 404 Not Found and 410 Gone errors when doing resumable uploads by starting the entire upload over from the beginning.

Logs

2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - Exception in thread "main" com.google.cloud.bigquery.BigQueryException: 404 Not Found
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - PUT https://www.googleapis.com/upload/bigquery/v2/projects/earthbones-shopify/jobs?uploadType=resumable&upload_id=ABg5-UyhG8f29jrwEfJJ2mu4JTAIjUF_B3F5JvymoctXdtBQjQ29LsavMFtadgBiXThGjQcTtZAffIZJTf0940HdexnlUOtGew
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:115)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.write(HttpBigQueryRpc.java:705)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at com.google.cloud.bigquery.TableDataWriteChannel$1.call(TableDataWriteChannel.java:59)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at com.google.cloud.bigquery.TableDataWriteChannel$1.call(TableDataWriteChannel.java:54)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at com.google.cloud.bigquery.TableDataWriteChannel.flushBuffer(TableDataWriteChannel.java:53)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at com.google.cloud.BaseWriteChannel.flush(BaseWriteChannel.java:112)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at com.google.cloud.BaseWriteChannel.write(BaseWriteChannel.java:139)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at io.airbyte.integrations.destination.bigquery.BigQueryDestination$RecordConsumer.acceptTracked(BigQueryDestination.java:358)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.accept(FailureTrackingAirbyteMessageConsumer.java:66)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:132)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at io.airbyte.integrations.base.IntegrationRunner.run(IntegrationRunner.java:113)
2021-05-21 03:10:39 ERROR (/tmp/workspace/6/0) LineGobbler(voidCall):69 - 	at io.airbyte.integrations.destination.bigquery.BigQueryDestination.main(BigQueryDestination.java:421)

Steps to Reproduce

It's not clear how to reproduce. Maybe this happens when trying to sync large Shopify datasets to BQ.

Severity of the bug for you

Very low / Low / Medium / High / Critical

Airbyte Version

0.23.0-alpha

Connector Version (if applicable)

0.2.2 Shopify

Additional context

Slack convo

┆Issue is synchronized with this Asana task by Unito

@marcosmarxm marcosmarxm added type/bug Something isn't working area/connectors Connector related issues labels May 22, 2021
@marcosmarxm
Copy link
Member Author

I'm syncing 3 years of real data here but it still baffles me that this can take so long
It took 12 hours for something to happen and crashed because of a 404 Error
it is still marked as running
yes, maybe the dataset is too big? I reduced the dataset from stripe and removed events and stuff and after that every sync took around an hour even on sync that take 3 years of data

User logs syncing Stripe to Bigquery
2021-05-31 14:42:37 INFO (/tmp/workspace/4/0) WorkerRun(call):62 - Executing worker wrapper. Airbyte version: 0.24.4-alpha
2021-05-31 14:42:37 INFO (/tmp/workspace/4/0) TemporalAttemptExecution(get):111 - Executing worker wrapper. Airbyte version: 0.24.4-alpha
2021-05-31 14:42:37 INFO (/tmp/workspace/4/0) DefaultReplicationWorker(run):97 - start sync worker. job id: 4 attempt id: 0
2021-05-31 14:42:37 INFO (/tmp/workspace/4/0) DefaultReplicationWorker(run):106 - configured sync modes: {null.charges=incremental - append, null.customer_balance_transactions=full_refresh - overwrite, null.coupons=incremental - append, null.products=incremental - append, null.balance_transactions=incremental - append, null.refunds=incremental - append, null.invoice_line_items=full_refresh - overwrite, null.transfers=incremental - append, null.payouts=incremental - append, null.customers=incremental - append, null.plans=incremental - append, null.disputes=incremental - append, null.subscriptions=incremental - append, null.invoice_items=incremental - append, null.events=incremental - append, null.bank_accounts=full_refresh - overwrite, null.subscription_items=full_refresh - overwrite, null.invoices=incremental - append}
2021-05-31 14:42:37 INFO (/tmp/workspace/4/0) DefaultAirbyteDestination(start):81 - Running destination...
2021-05-31 14:42:37 INFO (/tmp/workspace/4/0) LineGobbler(voidCall):69 - Checking if airbyte/destination-bigquery:0.3.3 exists...
2021-05-31 14:42:37 INFO (/tmp/workspace/4/0) LineGobbler(voidCall):69 - airbyte/destination-bigquery:0.3.3 was found locally.
2021-05-31 14:42:37 INFO (/tmp/workspace/4/0) DockerProcessFactory(create):111 - Preparing command: docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/4/0 --network host airbyte/destination-bigquery:0.3.3 write --config destination_config.json --catalog destination_catalog.json
2021-05-31 14:42:37 INFO (/tmp/workspace/4/0) LineGobbler(voidCall):69 - Checking if airbyte/source-stripe:0.1.11 exists...
2021-05-31 14:42:37 INFO (/tmp/workspace/4/0) LineGobbler(voidCall):69 - airbyte/source-stripe:0.1.11 was found locally.
2021-05-31 14:42:37 INFO (/tmp/workspace/4/0) DockerProcessFactory(create):111 - Preparing command: docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/4/0 --network host airbyte/source-stripe:0.1.11 read --config source_config.json --catalog source_catalog.json
2021-05-31 14:42:37 INFO (/tmp/workspace/4/0) DefaultReplicationWorker(run):132 - Waiting for source thread to join.
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - Exception in thread "main" com.google.cloud.bigquery.BigQueryException: 404 Not Found
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - PUT https://www.googleapis.com/upload/bigquery/v2/projects/craftnote-live/jobs?uploadType=resumable&upload_id=ABg5-Uyn1T2bRFLs9Ww15UeF2uZYz4c5JKYhSFwjkOAtJxG8ZX7TIXb79sP5EEC3SQJaTrxmvwiXAjedtS4IHYH7t2uxn2Ebkg
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:115)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.write(HttpBigQueryRpc.java:705)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at com.google.cloud.bigquery.TableDataWriteChannel$1.call(TableDataWriteChannel.java:59)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at com.google.cloud.bigquery.TableDataWriteChannel$1.call(TableDataWriteChannel.java:54)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at com.google.cloud.bigquery.TableDataWriteChannel.flushBuffer(TableDataWriteChannel.java:53)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at com.google.cloud.BaseWriteChannel.flush(BaseWriteChannel.java:112)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at com.google.cloud.BaseWriteChannel.write(BaseWriteChannel.java:139)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.acceptTracked(BigQueryRecordConsumer.java:110)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.accept(FailureTrackingAirbyteMessageConsumer.java:66)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:132)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at io.airbyte.integrations.base.IntegrationRunner.run(IntegrationRunner.java:113)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at io.airbyte.integrations.destination.bigquery.BigQueryDestination.main(BigQueryDestination.java:242)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	Suppressed: com.google.cloud.bigquery.BigQueryException: 404 Not Found
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - PUT https://www.googleapis.com/upload/bigquery/v2/projects/craftnote-live/jobs?uploadType=resumable&upload_id=ABg5-UwBpKMxykE_ozdsa2iTwzM_dtdMj9SNZ94qlcOyhhmgDSLpYvbqyWTBBynjFrUpL-ja36wn_2a7fwtRqWyqCy2NkEPKdg
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:115)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.write(HttpBigQueryRpc.java:705)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at com.google.cloud.bigquery.TableDataWriteChannel$1.call(TableDataWriteChannel.java:59)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at com.google.cloud.bigquery.TableDataWriteChannel$1.call(TableDataWriteChannel.java:54)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at com.google.cloud.bigquery.TableDataWriteChannel.flushBuffer(TableDataWriteChannel.java:53)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at com.google.cloud.BaseWriteChannel.close(BaseWriteChannel.java:151)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.lambda$close$0(BigQueryRecordConsumer.java:122)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at io.airbyte.commons.lang.Exceptions.castCheckedToRuntime(Exceptions.java:72)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at io.airbyte.commons.lang.Exceptions.toRuntime(Exceptions.java:59)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.lambda$close$1(BigQueryRecordConsumer.java:122)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1766)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.helpCC(ForkJoinPool.java:1116)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.concurrent.ForkJoinPool.externalHelpComplete(ForkJoinPool.java:1966)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.concurrent.ForkJoinTask.tryExternalHelp(ForkJoinTask.java:378)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:323)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:412)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.close(BigQueryRecordConsumer.java:122)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.close(FailureTrackingAirbyteMessageConsumer.java:78)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:126)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		... 2 more
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	Caused by: com.google.api.client.http.HttpResponseException: 404 Not Found
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - PUT https://www.googleapis.com/upload/bigquery/v2/projects/craftnote-live/jobs?uploadType=resumable&upload_id=ABg5-UwBpKMxykE_ozdsa2iTwzM_dtdMj9SNZ94qlcOyhhmgDSLpYvbqyWTBBynjFrUpL-ja36wn_2a7fwtRqWyqCy2NkEPKdg
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1113)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.write(HttpBigQueryRpc.java:688)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 		... 33 more
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - Caused by: com.google.api.client.http.HttpResponseException: 404 Not Found
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - PUT https://www.googleapis.com/upload/bigquery/v2/projects/craftnote-live/jobs?uploadType=resumable&upload_id=ABg5-Uyn1T2bRFLs9Ww15UeF2uZYz4c5JKYhSFwjkOAtJxG8ZX7TIXb79sP5EEC3SQJaTrxmvwiXAjedtS4IHYH7t2uxn2Ebkg
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1113)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.write(HttpBigQueryRpc.java:688)
2021-06-01 06:23:16 ERROR (/tmp/workspace/4/0) LineGobbler(voidCall):69 - 	... 13 more

@marcosmarxm
Copy link
Member Author

marcosmarxm commented Jun 1, 2021

@sherifnada tagging you because this is the second case this error happens. Not sure if checkpointing can handle this because is intermittent failing...

@marcosmarxm marcosmarxm changed the title BQ returns 404 when upload + resumable Destination Bigquery returns 404 when uploading data + resumable Jun 1, 2021
@andresbravog
Copy link
Contributor

Hi, this is happening to me on Postgres => BQ on a huge database first sync. The sync process gets stuck at that point.

2021-06-09 02:06:07 INFO (/tmp/workspace/56/0) DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-06-09 02:06:07 �[32mINFO�[m i.a.i.s.j.AbstractJdbcSource(lambda$createReadIterator$8):308 - {} - Reading stream assessment_responses. Records read: 20000
2021-06-09 02:06:08 INFO (/tmp/workspace/56/0) DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-06-09 02:06:08 �[32mINFO�[m i.a.i.s.j.AbstractJdbcSource(lambda$createReadIterator$8):308 - {} - Reading stream assessment_responses. Records read: 30000
2021-06-09 02:06:08 INFO (/tmp/workspace/56/0) DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-06-09 02:06:08 �[32mINFO�[m i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):77 - {} - hasFailed: true.
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - Exception in thread "main" com.google.cloud.bigquery.BigQueryException: 404 Not Found
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - PUT https://www.googleapis.com/upload/bigquery/v2/projects/useful-proposal-454/jobs?uploadType=resumable&upload_id=ABg5-UwHdi3-fSOUNSt1tTQjj7-DE3Gxm2N8x8sExPIVS7G-J6-mJ22EEezApb40TtA6xhvKN9K8NCnap2OoP672QqZYjmr0EA
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:115)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.write(HttpBigQueryRpc.java:705)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.bigquery.TableDataWriteChannel$1.call(TableDataWriteChannel.java:59)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.bigquery.TableDataWriteChannel$1.call(TableDataWriteChannel.java:54)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.bigquery.TableDataWriteChannel.flushBuffer(TableDataWriteChannel.java:53)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.BaseWriteChannel.flush(BaseWriteChannel.java:112)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.BaseWriteChannel.write(BaseWriteChannel.java:139)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.acceptTracked(BigQueryRecordConsumer.java:110)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.accept(FailureTrackingAirbyteMessageConsumer.java:66)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:132)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at io.airbyte.integrations.base.IntegrationRunner.run(IntegrationRunner.java:113)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at io.airbyte.integrations.destination.bigquery.BigQueryDestination.main(BigQueryDestination.java:242)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - Suppressed: com.google.cloud.bigquery.BigQueryException: 404 Not Found
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - PUT https://www.googleapis.com/upload/bigquery/v2/projects/useful-proposal-454/jobs?uploadType=resumable&upload_id=ABg5-Uy6YIL7DnEd9wNiRd27Ms39-N6TcXMELYbqNaQcmktK-3PKH8JAd-1h54mKdKFGKoUnVvQVOrWbi5zoyg3xzD5JtnBClQ
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:115)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.write(HttpBigQueryRpc.java:705)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.bigquery.TableDataWriteChannel$1.call(TableDataWriteChannel.java:59)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.bigquery.TableDataWriteChannel$1.call(TableDataWriteChannel.java:54)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.bigquery.TableDataWriteChannel.flushBuffer(TableDataWriteChannel.java:53)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.BaseWriteChannel.close(BaseWriteChannel.java:151)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.lambda$close$0(BigQueryRecordConsumer.java:122)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at io.airbyte.commons.lang.Exceptions.castCheckedToRuntime(Exceptions.java:72)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at io.airbyte.commons.lang.Exceptions.toRuntime(Exceptions.java:59)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.lambda$close$1(BigQueryRecordConsumer.java:122)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1766)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.close(BigQueryRecordConsumer.java:122)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.close(FailureTrackingAirbyteMessageConsumer.java:78)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:126)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - ... 2 more
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - Caused by: com.google.api.client.http.HttpResponseException: 404 Not Found
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - PUT https://www.googleapis.com/upload/bigquery/v2/projects/useful-proposal-454/jobs?uploadType=resumable&upload_id=ABg5-Uy6YIL7DnEd9wNiRd27Ms39-N6TcXMELYbqNaQcmktK-3PKH8JAd-1h54mKdKFGKoUnVvQVOrWbi5zoyg3xzD5JtnBClQ
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1113)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.write(HttpBigQueryRpc.java:688)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - ... 29 more
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - Caused by: com.google.api.client.http.HttpResponseException: 404 Not Found
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - PUT https://www.googleapis.com/upload/bigquery/v2/projects/useful-proposal-454/jobs?uploadType=resumable&upload_id=ABg5-UwHdi3-fSOUNSt1tTQjj7-DE3Gxm2N8x8sExPIVS7G-J6-mJ22EEezApb40TtA6xhvKN9K8NCnap2OoP672QqZYjmr0EA
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1113)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.write(HttpBigQueryRpc.java:688)
2021-06-09 02:06:33 ERROR (/tmp/workspace/56/0) LineGobbler(voidCall):85 - ... 13 more

@marcosmarxm marcosmarxm added the priority/high High priority label Jun 10, 2021
@marcosmarxm
Copy link
Member Author

User could sync after selecting only a subset of tables.

I was able to finish the sync by selecting only 10 tables each time but that was a lot of iterations.

@marcosmarxm marcosmarxm added this to the Connectors June 25, 2021 milestone Jun 15, 2021
@sherifnada sherifnada removed this from the Connectors June 25, 2021 milestone Jun 22, 2021
@sherifnada
Copy link
Contributor

my guess is that the right way to deal with this is bulk loading data into BQ via staging on GCS. @andresbravog would that be an acceptable solution for you?

@andresbravog
Copy link
Contributor

@sherifnada we already have a workaround by creating a script that waits for each job to finish and ads one more table to the connection each time (controlling when the process gets stuck).

My take here is that the process has two main bugs:

  • The process gets frozen and does not fail after the error, meaning there is no second attempt and no release of the resources until the user manually clicks on cancel.
  • The process is failing due to a single 404 with no retry (this affects long processes needing to restart)

Ideally, we would like to make BQ destination error prune.

@hendrikjd
Copy link

Running into a similar issue here with Stripe to BigQuery.

2021-07-13 02:58:24 ERROR () LineGobbler(voidCall):85 - Exception in thread "main" com.google.cloud.bigquery.BigQueryException: 404 Not Found followed by more errors. Status in Airbyte continues to say "running" for 24+ hours.

@flagbug
Copy link

flagbug commented Jul 18, 2021

Seeing the same problem when trying to load a few tables, with one of them containing nearly a Billion rows from MSSQL to BigQuery. The exception happens pretty late in the sync, after it already ran for a few hours.

2021-07-17 18:01:29 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-07-17 18:01:29 �[33mWARN�[m i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):78 - {} - Airbyte message consumer: failed.
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - Exception in thread "main" com.google.cloud.bigquery.BigQueryException: 404 Not Found
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - PUT https://www.googleapis.com/upload/bigquery/v2/projects/<redacted>/jobs?uploadType=resumable&upload_id=<redacted>
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:115)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.write(HttpBigQueryRpc.java:705)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at com.google.cloud.bigquery.TableDataWriteChannel$1.call(TableDataWriteChannel.java:59)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at com.google.cloud.bigquery.TableDataWriteChannel$1.call(TableDataWriteChannel.java:54)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at com.google.cloud.bigquery.TableDataWriteChannel.flushBuffer(TableDataWriteChannel.java:53)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at com.google.cloud.BaseWriteChannel.flush(BaseWriteChannel.java:112)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at com.google.cloud.BaseWriteChannel.write(BaseWriteChannel.java:139)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.acceptTracked(BigQueryRecordConsumer.java:101)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.accept(FailureTrackingAirbyteMessageConsumer.java:66)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:132)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at io.airbyte.integrations.base.IntegrationRunner.run(IntegrationRunner.java:113)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	at io.airbyte.integrations.destination.bigquery.BigQueryDestination.main(BigQueryDestination.java:279)
2021-07-17 18:01:32 ERROR () LineGobbler(voidCall):85 - 	Suppressed: com.google.cloud.bigquery.BigQueryException: 404 Not Found

@sherifnada
Copy link
Contributor

hey everyone, we're working on this in the current sprint. It's not 100% clear yet what the fix will be, so our first step is to identify that, then we'll have a more concrete ETA. But it's something we're currently working on.

@andresbravog
Copy link
Contributor

andresbravog commented Jul 21, 2021

Hi @sherifnada,

I think this issue is documented by Google Big Query as a quota limit hit that might happen at BigQuery level (>1TB load data) or at job level:

https://cloud.google.com/bigquery/quotas#load_jobs

I think is very possible that we hit the jobs per table limit quota.

The suggestion I've seen is to simply implement a retry strategy.

@etsybaev etsybaev self-assigned this Jul 21, 2021
@etsybaev
Copy link
Contributor

etsybaev commented Jul 22, 2021

Still working on attempts to reproduce the issue and code checking, but got some idea why this could happen.
At the current implementation, we write data to temporary tables and on close() method from AutoClosable (if all went well) we move data to final\permanent table. On a write() method we use the ByteBuffer to collect messages (write( ByteBuffer.wrap(....)) and flush them once reach some limit or onClose() for the stream. Probably more or less at the same time when close the stream and removing temporary tables The default buffers size in BaseWriteChannel seems to be pretty big:
private static final int MIN_CHUNK_SIZE = 256 * 1024; // 256 KiB
private static final int DEFAULT_CHUNK_SIZE = 60 * MIN_CHUNK_SIZE; // 15MiB

Each stream seems to have its own buffer.
Currently running some tests. Already sent about 100'000'000 not a big records, but temp table is still empty which means all those records are still in the buffer. If we get a lot of streams at the time that may mean that we remove all temporary tables before all buffers finish flushing into it. In this case error "404 Not Found" may make sense.
So for example having 20 streams * 15mb = 300mb to upload. Even with fast internet, it would take some time to upload it. If that really appears to be a root cause - the fix may to decrease buffers size or even make it customizable from UI.
Will keep working on that ticket and checking the assumption above.

@etsybaev
Copy link
Contributor

etsybaev commented Jul 23, 2021

Another finding\assumption, The actual reason of fail seems to an out of memory error.
Keep trying to run it from integration tests locally. If we create 25 tables and process them at the same time - even adding millions of records to each works fine. But whenever I try to process 30 tables\streams at the same time, it fails pretty fast with the outOfMemory. If I try with 29 streams = works for some period, but then anyway fails with out of memory

The assumption to check if statement is valid:

  1. Consumer fails with Out Of Memory. caused by BigQuery SDK client
  2. AcceptTracked method receives new record to write but doesn't check if stream still alive\tables exists etc.
  3. Attempt for another record fails with 404 as tmp tables doesn't exist anymore/changed or stream is down

org.gradle.api.internal.tasks.testing.TestSuiteExecutionException: Could not complete execution for Gradle Test Executor 3. at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.stop(SuiteTestClassProcessor.java:63) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) at com.sun.proxy.$Proxy5.stop(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.stop(TestWorker.java:133) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164) at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:414) at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56) at java.base/java.lang.Thread.run(Thread.java:832) Caused by: java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOf(Arrays.java:3536) at com.google.cloud.BaseWriteChannel.write(BaseWriteChannel.java:135) at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.acceptTracked(BigQueryRecordConsumer.java:105) at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.accept(FailureTrackingAirbyteMessageConsumer.java:66) at io.airbyte.integrations.destination.bigquery.BigQueryDestinationPerformanceTest.testWriteSuccess(BigQueryDestinationPerformanceTest.java:199) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor$$Lambda$173/0x0000000800c1f040.apply(Unknown Source) at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall$$Lambda$174/0x0000000800c1f440.apply(Unknown Source) at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) at org.junit.jupiter.engine.execution.ExecutableInvoker$$Lambda$312/0x0000000800c55440.apply(Unknown Source) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor$$Lambda$467/0x0000000800ed6c40.execute(Unknown Source) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65)

To check if helps:

BigQuery

UPD.
This is how memory consumption for 300 streams looks like:

a. With default BigQuery's native client - 15 MB
Selection_263

b. With - 7 MB
Selection_265

c. With - 1 MB
Selection_264

Obviously, as smaller chunk size as less Heap memory consumed, but that also causes a slower uploading

@etsybaev
Copy link
Contributor

etsybaev commented Jul 23, 2021

Hi @andresbravog @flagbug @hendrikjd.
Can I ask you please to clarify what do you see in bigQuery schema when airbyte prints errors in logs, but job is still running? Are there any tmp tables with something like "..._tmp_jqs..." in their name or do you see all names something like "...airbyte_raw...."? Especially for the failed streams\table. Thanks!

@andresbravog
Copy link
Contributor

In our case tmp tables are there but not raw ones.

@etsybaev
Copy link
Contributor

etsybaev commented Jul 26, 2021

Hi there. Still experiencing some issues with reproducing this bug and understanding the root cause. Need some help from those who can reproduce it. Thanks in advance!

Already even tried to upload 600'000'000 to few big query tables, but it didn't fail
Selection_272

Gently asking for those who can reproduce it to attach
1.Full logs (from the very beginning to the very end)
2. Full info about env where app is running
3. Overall RAM memory allocated for the whole env
4. Memory and other resources allocated for each container ("docker stats")
5. How much memory is allocated for JVM (-Xmx in particular) inside each docker container (especially for the one that is created after run "docker run --rm -i write --config ..." command). It would be great if you can upload a memory dump from the failed container.
6. How many tables work for you at a time and how many tables are selected when job stuck\fail.
7. When it usually fails(closer to beginning\ in the middle\at the end)? Does it fail faster or later if you select much more tables\streams to sync?
8. Do you still have alive ("docker ps" or "docker stats") the container that was created after "docker run --rm -i write --config ..." command was executed?
9. Could you please monitor a heap memory consumption during the run when it fails? Or at least containers memory like "docker stats"
10. Are you sure that you didn't have any non stopped containers after the previous run before starting a new sync?
11. How the migration config looked like?

For now, the only option I see is that the native SDK bigquery client fails with out of heap memory at the moment when it's called like writer.write(...) from "acceptTracked" method. I can reproduce such outOfMemory (HEAP) using some local unit\integration tests running directly on pc without docker containers), but the main web app doesn't receive any info that the container crashed. In this case we get all tmp tables created and empty (as the consumer had not been closed correctly) which more or less meets what @andresbravog also observed. But then it's not clear where the 404 comes from? Maybe something tries to send messages to died destination connector in the middle...
Currently, we use the default BigQuery SDk's arrayBuffer = 15MiB. So for more than 200 tables, we would need to have more than 3GB-4GB of heap memory just inside of the single container that is created after "r "docker run --rm -i write --config ..." command.
In addition during the long run, I've found that the client consumes a little bit more and more memory (see capture) below, so we should start migration with current consumption right fore limit. Usually jvm allocates to HEAP the 25% of all available memory. So even if you have 16GB RAM for everything, the only 4GB would be allocated to HEAP
Selection_267
Selection_268
Selection_269

@sherifnada @tuliren @subodh1810 maybe you have any idea as engineers with a great understanding of this project?
Many thanks in advance!

@tuliren
Copy link
Contributor

tuliren commented Jul 27, 2021

whenever I try to process 30 tables\streams at the same time, it fails pretty fast with the outOfMemory.

This feels like the OOME is caused by too many big query writers. The S3 destination had similar problem before.

We use a separate writer for each stream (the writeConfigs in BigQueryRecordConsumer), and each writer can take up a fixed number of memory. So naturally when there are too many streams, they need more than the fixed memory allocated to them. It's worth checking the memory usage of the TableDataWriteChannel class, and adjust the settings when constructing it if any such setting is available.

@tuliren
Copy link
Contributor

tuliren commented Jul 27, 2021

Sorry, I replied too soon. What I mentioned above is already included in your investigation. #5029 looks like a good solution.

Alternatively, we can automatically decrease the chunk size if there are many streams.

@hendrikjd
Copy link

Hi @andresbravog @flagbug @hendrikjd.
Can I ask you please to clarify what do you see in bigQuery schema when airbyte prints errors in logs, but job is still running? Are there any tmp tables with something like "..._tmp_jqs..." in their name or do you see all names something like "...airbyte_raw...."? Especially for the failed streams\table. Thanks!

For me it's only "...tmp..." tables. All of them are empty.
image

@sherifnada
Copy link
Contributor

sherifnada commented Jul 30, 2021

@hendrikjd and everyone else: like @etsybaev mentioned above, we were really stumped trying to reproduce this issue, so we released a new version of BQ with a lot of debugging logs that may help us understand what's going on. Could you upgrade to the latest bigquery destination connector and see if the issue persists and if so share logs?

@etsybaev etsybaev reopened this Aug 3, 2021
@etsybaev
Copy link
Contributor

etsybaev commented Aug 3, 2021

HI there.
Just merged changes to the connector. Since now we have one more optional arg for the client. Kidly ask those who observe the issue with migration to try to decrease a chunk size (https://github.com/airbytehq/airbyte/pull/5029/files#diff-72754d3b28f568c7c6cea065797446c3844c7edfffa0072f6f3c7f91e0e2388dR14). The default value is 15MiB (less requests to google server, but consumes pretty much memory for big DBs). So if you have some issues, try to decrease this value (range limit is from 1Mb to 15Mb) and at the same time give more RAM memory to the system.
Putting this issue for some kind of "on hold" and waiting for the feedback if decreasing chunk size helped. If no, kindly ask to provide extended logs and info. List of requested info you may find in one of my comments above. Many thanks for collaboration and happy migration :)

@hendrikjd
Copy link

I've tried twice more with the updated connector. Once with default value for chunk size and once with it set to 5. Neither were successful. Here's the logs for both.
5 chunk size.txt
default chunk size.txt

I'm only getting the error with Stripe -> BigQeury so I tried to find an alternative for that source. Looking through the Fivetran docs I noticed that they don't support BigQuery Sandbox accounts which is what I'm using right now. Not sure if that's related with the issue here but felt like it's worth pointing out. The limitations of sandbox are listed here.

@etsybaev
Copy link
Contributor

etsybaev commented Aug 8, 2021

It seems like I've found how to reproduce it. The key point here is not how big is the data set that you are going to upload and how long it's being uploaded. But the main point is how much time did it take to read records from source DB. We create a destination connector's container at the very beginning along with the source connector's container as far as I see far now. The google big query clients and tmp dirs are created at the time when the destination container is created. Then we wait for source to read data meanwhile, the destination one is in an idle state.
I was able to reproduce this issue locally from tests even with only one table\stream and only one record.

  1. Create a destination client and init google big query stream in it (it initiates automatically at the getConsumer() phase.
  2. Do a Tread.sleep for test thread for 12 hours keeping the destination container in idle state.
  3. Write a record to the destination.
    At the moment when the native google bigQuery client will try to flush its buffer - it fails. It flushes it either when the buffer is full or onClose depends on what happens first.

Next steps:

  1. Identify is there any TTL arg in bigQuery clients or any kind of "keep alive" requests.
  2. If no, then think on how can we rewrite a connector to create a table at the first call to write something (acceptTracked() method). Still need to consider, how to make buffer flushed when started getting messages for the next migrated stream. Google SDK client doesn't provide an option for force flush buffer on another way than just close the connection.

By the way, checked how much memory it takes to write 18 streams with chunk buffer equals to 5MB. Here is the result
Selection_300

UPD: it seems like google SDK client doesn't have any kind of "keep alive" requests. It also doesn't have an explicit option to force flush its buffer other than just close a connection. But we do not have an internal message from the source that says that read completed and no more messages to write expected (as far as I see, but maybe I missing something).
UPD2. As far as I was told we do not have any source connectors that may process several streams in parallel. So we may try to use stream name changing as an indicator that we need to close old stream and open a new one

@sherifnada
Copy link
Contributor

amazing work @etsybaev , seems like you're getting very close here. So it sounds like the main hypothesis then is if the time difference between two interactions with a BQ Stream ("interact" can mean open, write, or close) then we receive a 404?

Like you noted, I'm not sure there is an "easy" way around this. Ideally we need to create a job as late as possible for each stream. So we could potentially do lazy creation. That doesn't solve the "close" issue though where a stream receives all of its data at the beginning of the sync, but the sync takes long enough that by the time we close the stream it has been a long time.

I think the most guaranteed way around this is using Bulk loading as it would allow us to stage the data on GCS entirely before loading it into bigquery. This is already implemented in Snowflake so we'd need something similar implemented for bigquery. WDYT?

If you agree @etsybaev could you create an issue for this and move it into prioritized for scoping? we should work on that soon.

@etsybaev
Copy link
Contributor

etsybaev commented Aug 10, 2021

Hi @sherifnada.
I've created a follow-up ticket.
I can't be a 100% sure that this is the same issue I was able to reproduce, but yes. In my case it also fails with 404 like this which looks pretty similar (full logs can be found in attached file PerftestLogs.txt):
com.google.cloud.bigquery.BigQueryException: 404 Not Found PUT https://www.googleapis.com/upload/bigquery/v2/projects/dataline-integration-testing/jobs?uploadType=resumable&upload_id=ADPycdvsL3_l8k9SNv1ukO5LmDPkcpoZJJf4hpCRgF9baBKPhf_UrgwmGqfTUfX_5MSaJ0QiXhgaq_JYi5biynNDfgg java.lang.RuntimeException: com.google.cloud.bigquery.BigQueryException: 404 Not Found PUT https://www.googleapis.com/upload/bigquery/v2/projects/dataline-integration-testing/jobs?uploadType=resumable&upload_id=ADPycdvsL3_l8k9SNv1ukO5LmDPkcpoZJJf4hpCRgF9baBKPhf_UrgwmGqfTUfX_5MSaJ0QiXhgaq_JYi5biynNDfgg at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.lambda$close$0(BigQueryRecordConsumer.java:149) at io.airbyte.commons.lang.Exceptions.castCheckedToRuntime(Exceptions.java:72) at io.airbyte.commons.lang.Exceptions.toRuntime(Exceptions.java:59) at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.lambda$close$1(BigQueryRecordConsumer.java:141) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1766) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290) at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408) at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661) at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.close(BigQueryRecordConsumer.java:141) at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.close(FailureTrackingAirbyteMessageConsumer.java:82) at io.airbyte.integrations.destination.bigquery.BigQueryDestinationPerformanceTest.testWriteSuccess(BigQueryDestinationPerformanceTest.java:212) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129) at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126) at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84) at java.base/java.util.ArrayList.forEach(ArrayList.java:1511) at

@etsybaev
Copy link
Contributor

etsybaev commented Sep 8, 2021

Hi there. I've added a new option for destination-bigquery - uploading through GCS. Hope this will solve this issue
#5614

FYI. There is another issue had been reported for GCS, but hopefully, it will be solved soon. #5720

Closing the issue, please re-open if still see it for some reason. Thanks

@anatolec
Copy link

Hi,
The initial sync of my stripe account with BigQuery was failing because of this issue. So, as recommended I switched to the GCS staging method. Thanks to this, the initial sync worked correctly. But now that the big initial load has passed I would like to switch back to the standard sync method for refreshes which are much smaller, without going through GCS. However when I launch a sync it seems to restart from the beginning, as if Airbyte was not recognizing the sync done through GCS. Is it expected and is there any solution to this problem ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment