Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt seed should retry on bigquery when it tells us to #1579

Closed
beckjake opened this issue Jun 27, 2019 · 3 comments · Fixed by #2694
Closed

dbt seed should retry on bigquery when it tells us to #1579

beckjake opened this issue Jun 27, 2019 · 3 comments · Fixed by #2694
Milestone

Comments

@beckjake
Copy link
Contributor

Issue

Sometimes dbt seed runs fail on bigquery with an error telling us to retry, but we don't.

Issue description

This is especially annoying in integration tests, here's an example:
https://dev.azure.com/fishtown-analytics/dbt/_build/results?buildId=302

The failure text is:

Runtime Error in seed source (data\source.csv)
  Error encountered during execution. Retrying may solve the problem.
  You are loading data without specifying data format, data will be treated as CSV format by default. If this is not what you mean, please specify data format by --source_format.

And, in fact, retrying will solve the problem. This is some weird transient failure inside bigquery. I'm pretty much positive that the second message is caused by the first (BQ gets an empty csv due to the previous error?) and should be ignored. dbt should catch this and retry the seed.

I'm sure this also happens in the "real world" once in a while.

Results

dbt seed failed with an error. I expected it to succeed

System information

dbt version 0.14.0-ish

The operating system you're running on: Any OS

The python version you're using (probably the output of python --version): Any python

Steps to reproduce

Run dbt seed a few thousand times because you run a lot of tests
Experience a couple failures, feel frustrated about having to re-run the entire suite each time for this.

@kconvey
Copy link
Contributor

kconvey commented Nov 15, 2019

This is something I have a need for, and plan on implementing in a fork, but would love to contribute back to dbt.

My rough plan for how to implement this is to wrap calls to the client (I use bigquery) in a call to a helper function that retries in accordance with the timeout, and a new configurable 'retries' parameter set in profiles.yml. It would be nice to delegate retrying logic to another library that already implements nice things like politely exponentially backing-off (the bigquery adapter already depends on google.api_core which has a 'retry' module that does exactly this).

Before I get too far on something that doesn't align well with what dbt wants, it would be helpful to know if it makes sense to implement this per adapter. The adapter feels to me like the most logical place to implement this, as the first point where exceptions from the client can be caught and retried, but this is a feature that all adapters could benefit from. A compelling reason to implement this per adapter might be that exceptions will vary per adapter, so logic that determines whether an exception is retryable will depend on the adapter.

Curious to hear any thoughts on this approach!

@drewbanin
Copy link
Contributor

Hey @kconvey - I think this feature definitely has merit on BigQuery. I don't really perceive a need/desire for it on other databases though. BQ returns random 500 errors with some regularity and retrying really does make the query succeed. That just... doesn't usually happen on any of Snowflake/Redshift/Postgres.

My thinking is that when BigQuery returns a 500 error code (or an error message that says "Retrying may solve the problem"), dbt can retry the query. I think there could be merit to retrying with some sort of backoff, but really, I'd be equivalently comfortable retrying a single time after something like 10 seconds. I'm happy for the number of retries and the timeout interval to be configurable though.

Overall, I very much agree with your thinking here! I'd just say that we can make this BQ specific initially, but we should implement it in a way that could be extended to other plugins in the future.

@drewbanin
Copy link
Contributor

Closed by #1963 - this is going out in dbt v0.15.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants