Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing for steps to retry #1630

Closed
zahink opened this issue Jul 24, 2019 · 6 comments
Closed

Allowing for steps to retry #1630

zahink opened this issue Jul 24, 2019 · 6 comments
Labels
bigquery enhancement New feature or request

Comments

@zahink
Copy link

zahink commented Jul 24, 2019

Please make sure to fill out either the issue template or the feature template and delete the other one!

Issue

BigQuery can hit some transient issues, its important to be able to configure a certain number of retries for a step in a DAG.

Issue description

When running a production run, we hit a transient bigquery error which failed that particular bigQuery query and all subsequent ones that depended on its output.

Results

What I expected was that dbt would have some configuration that would allow us to set a number of allowable retries.

System information

This was a run on dbtcloud.

Steps to reproduce

transient error on bigquery's side hard to reproduce.

Feature

Feature description

allow for configuring retry logic at an individual step or the whole DAG

Who will this benefit?

Anyone that rerlies on dbt_ for production and can't have transient errors killing the whole DAG.

@drewbanin drewbanin added bigquery enhancement New feature or request labels Jul 24, 2019
@advincze
Copy link

advincze commented Oct 9, 2019

We have similar issues with redshift (especially redshift spectrum) where retries would be very beneficial

@advincze
Copy link

advincze commented Oct 9, 2019

seems #1579 is also related

@hui-zheng
Copy link

hui-zheng commented Jan 3, 2020

We also ran into similar challenges in our BigQuery dbt run.

For example, in production, we have situations that backfilling historical data and scheduled incremental runs are happening at the same time, and sometimes update the same table.

We then got errors like this, which could be mitigated with some re-try

  domain: "cloud.helix.ErrorDomain" code: "QUERY_ERROR" argument: "Could not serialize access 
to table projectA:dataset1.table_A1 due to concurrent update" debug_info: "
[CONCURRENT_UPDATE] Table modified by concurrent UPDATE/DELETE/MERGE DML or truncation 
at 1578010970837. Storage set job_uuid: 3ca02cc1-8d32-4c3c-afdc-76f429c1add1_00008, 
instance_id: InsertedData, Reason: code=CONCURRENT_UPDATE message=Could not serialize 
access to table projectA:dataset1.table_A1 due to concurrent update debug=Table modified by 
concurrent UPDATE/DELETE/MERGE DML or truncation at 1578010970837. Storage set job_uuid: 
3ca02cc1-8d32-4c3c-afdc-76f429c1add1_00008, instance_id: InsertedData

@drewbanin
Copy link
Contributor

FYI #1963 adds retries to BigQuery when queries fail with a 500 status code (internal server error).

I'm going to close out this issue, as BigQuery is really the only place where 1) we see transient errors like this and 2) we receive a status code indicating that retrying can solve the problem. Happy to re-open if anyone has any further thoughts on this topic.

@friendofasquid
Copy link

In Spark, Presto and Athena, retries would be super helpful.

Particularly on Presto when running on EMR with SPOT instances, when instances are recalled and queries fail, a retry would be extremely helpful. Newer versions of Presto do support this internally, I believe, but not the newest version on EMR.

In some cases, we are forced to rerun the entire dbt job. A state:failed+ selector would be another interesting way to handle this.

@jtcohen6
Copy link
Contributor

@friendofasquid That's helpful context re: Presto. I totally agree about invocation-level retrial, too. There's a more recent issue discussing this over in #3303.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants