Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(#2195) Added timeout to registry download call #2228

Merged
merged 4 commits into from
Mar 24, 2020

Conversation

Raalsky
Copy link
Contributor

@Raalsky Raalsky commented Mar 23, 2020

resolves #2195 (potentially)

Description

Added a 10 seconds timeout to every download call during dbt deps. I think the timeout is only for waiting until the response will be started to emmiting (the download process may be longer than timeout). See requests - Timeouts for more info.

You can tell Requests to stop waiting for a response after a given number of seconds with the timeout parameter. Nearly all production code should use this parameter in nearly all requests. Failure to do so can cause your program to hang indefinitely:

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Mar 23, 2020
@Raalsky
Copy link
Contributor Author

Raalsky commented Mar 23, 2020

I have run this code in development and it appears to resolve the stated issue

The testing part looks awful I think (mock the requests internals or create /etc/hosts with desired low connection service). And as it looks to be a general good practice to add a timeout to requests and we actually don't know the 1:1 reason why the connection hangs maybe we don't need to add unit/integration test to these parts? 🤞

@drewbanin drewbanin requested a review from beckjake March 23, 2020 13:39
Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! I think there's not much value in testing this relative to the difficulty of making a test where the network hangs forever.

In the future, we may want to make this timeout configurable, even if it's just via an environment variable. While 10s is a totally reasonable maximum value for most places, it might not be great if you are both far away and have a bad connection. We'll see how this shakes out for now.

I'll kick off the remaining tests, this is approved assuming they pass.

@Raalsky
Copy link
Contributor Author

Raalsky commented Mar 23, 2020

@beckjake good point! I was thinking about using a global config or something which should be used here but (maybe I'm wrong) there is nothing available in registry.py throw self or something and I need to add a lot of refactors. Do we have some policy here (for ex. static CONNECTION_TIMEOUT at beginning of file etc.)?

@beckjake
Copy link
Contributor

Do we have some policy here (for ex. static CONNECTION_TIMEOUT at beginning of file etc.)?

Global constants are always nice, though I felt this was unambiguous enough to just leave it. I agree it'd be quite tough to get it through from a config file given the current state of registry.py, and it's why I thought of an environment variable.

@drewbanin
Copy link
Contributor

I'd be comfortable with adding a DBT_HTTP_TIMEOUT env var which we use whenever we download data with requests. I just want to make sure it's clear that this controls http web requests, not database connections, as those timeouts are going to be handled separately.

For the purposes of this PR, it would be great if we could apply this env var override in the scope of downloading data from the package hub

@Raalsky
Copy link
Contributor Author

Raalsky commented Mar 23, 2020

In the future, we may want to make this timeout configurable, even if it's just via an environment variable. While 10s is a totally reasonable maximum value for most places, it might not be great if you are both far away and have a bad connection. We'll see how this shakes out for now.

I'd be comfortable with adding a DBT_HTTP_TIMEOUT env var which we use whenever we download data with requests. I just want to make sure it's clear that this controls http web requests, not database connections, as those timeouts are going to be handled separately.

Done

@Raalsky
Copy link
Contributor Author

Raalsky commented Mar 23, 2020

I'll move the environ variable usage inside system.py insted of registry.py.

@beckjake beckjake merged commit 367bf68 into dbt-labs:dev/octavius-catto Mar 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add timeout + error handling to dbt deps requests to hub site
3 participants