Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staging destinations (bigquery, etc): Handle S3 auth error as a ConfigException #20092

Closed
edgao opened this issue Dec 5, 2022 · 7 comments · Fixed by #21087
Closed

Staging destinations (bigquery, etc): Handle S3 auth error as a ConfigException #20092

edgao opened this issue Dec 5, 2022 · 7 comments · Fixed by #21087
Assignees
Labels
team/destinations Destinations team's backlog

Comments

@edgao
Copy link
Contributor

edgao commented Dec 5, 2022

e.g. https://sentry.io/organizations/airbytehq/issues/3804018246/?project=6527718&referrer=slack

This code probably lives in base-java-s3, so we can roll it out for all our staging destinations (bigquery, snowflake, redshift, ...). This should get wrapped into a ConfigException so that it doesn't trigger alerting.

I'm pretty sure we already detect this in check, but would be good to confirm that also.

example stacktrace:

com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: null; S3 Extended Request ID: null; Proxy: null), S3 Extended Request ID: null
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5437)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5384)
	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1367)
	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1341)
	at com.amazonaws.services.s3.AmazonS3Client.doesObjectExist(AmazonS3Client.java:1422)
	at io.airbyte.integrations.destination.s3.S3StorageOperations.createBucketObjectIfNotExists(S3StorageOperations.java:110)
	at io.airbyte.integrations.destination.bigquery.BigQueryGcsOperations.createStageIfNotExists(BigQueryGcsOperations.java:102)
	at io.airbyte.integrations.destination.bigquery.BigQueryStagingConsumerFactory.lambda$onStartFunction$3(BigQueryStagingConsumerFactory.java:105)
	at io.airbyte.commons.concurrency.VoidCallable.call(VoidCallable.java:15)
	at io.airbyte.integrations.destination.buffered_stream_consumer.BufferedStreamConsumer.startTracked(BufferedStreamConsumer.java:118)
	at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.start(FailureTrackingAirbyteMessageConsumer.java:34)
	at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:233)
	at io.airbyte.integrations.base.IntegrationRunner.lambda$runConsumer$2(IntegrationRunner.java:241)
	at io.airbyte.integrations.base.IntegrationRunner.watchForOrphanThreads(IntegrationRunner.java:270)
	at io.airbyte.integrations.base.IntegrationRunner.runConsumer(IntegrationRunner.java:240)
	at io.airbyte.integrations.base.IntegrationRunner.runInternal(IntegrationRunner.java:150)
	at io.airbyte.integrations.base.IntegrationRunner.run(IntegrationRunner.java:100)
	at io.airbyte.integrations.destination.bigquery.BigQueryDestination.main(BigQueryDestination.java:327)
@edgao edgao added the team/destinations Destinations team's backlog label Dec 5, 2022
@grishick
Copy link
Contributor

This error is for some reason also not caught in CHECK method. See attached log.
greg_default_logs_958350_txt.txt

@grishick
Copy link
Contributor

I have been trying to reproduce this and so far, what I am seeing is that if GCS Bucket Path setting ends with a forward slash, e.g. testing19997/ the sync runs into this error. CHECK succeeds, but the sync fails (see attached log).
greg_default_logs_958569_txt.txt

@grishick
Copy link
Contributor

grishick commented Dec 15, 2022

Notes from grooming:
S3 allows multiple sequential slashes - we need to figure out if we can easily support this or if we need to restrict paths to directory-like syntax.

When validating the fix, will also need to test BigQuery (staging), GCS, and Snowflake destinations (S3 and GCS staging).

When the fix is done: publish destination-gcs, destination-s3, destination-bigquery, destination-bigquery-denormalize, destination-snowflake

@jbfbell
Copy link
Contributor

jbfbell commented Jan 5, 2023

Was looking into this and was able to reproduce it by creating a service account with the required permissions, setting up a connector then removing the permissions. It looks like On a failed sync, airbyte cloud makes 2 attempts. The CHECK method is only run on the second attempt. When the check method is run it produces a reasonable error.

Failure Origin: destination, Message: Checking destination connection failed - please review this connection's configuration to prevent future syncs from failing

However, when the check method is not run, the error is more opaque

Failure Origin: destination, Message: Checking destination connection failed - please review this connection's configuration to prevent future syncs from failing

I can put a PR together today with my proposed changes

@grishick
Copy link
Contributor

grishick commented Jan 5, 2023

@jbfbell the main ask of this issue is that when destinations run into an S3 auth error, they should emit AirbyteErrorTraceMessage with failure_type set to config_error. @ryankfu, @etsybaev, and @akashkulk have recently made a bunch of changes to emit config errors instead of system errors in several connectors, so they would be a good resource for direction.

@grishick
Copy link
Contributor

@jbfbell could you please link the PR to the issue?

@jbfbell jbfbell linked a pull request Jan 17, 2023 that will close this issue
@jbfbell
Copy link
Contributor

jbfbell commented Jan 19, 2023

This is currently blocked by an issue with the github publish action, but otherwise ready to be deployed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/destinations Destinations team's backlog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants