Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dbt downloads directory same across different users #895

Closed
brian-bk opened this issue Aug 1, 2018 · 7 comments
Closed

Dbt downloads directory same across different users #895

brian-bk opened this issue Aug 1, 2018 · 7 comments
Milestone

Comments

@brian-bk
Copy link

brian-bk commented Aug 1, 2018

Issue

Dbt downloads temporary directory cannot be shared between users.

Issue description

The dbt deps command, creates a folder /tmp/dbt-downloads if it needs to download anything. See deps.py

Results

If there are more than one users that may run dbt, the folder persists and is owned by whoever created it first. Later, if another user runs dbt and dbt needs to download something, it causes a permissions error and the second user cannot use dbt in that way.

System information

The output of dbt --version:

installed version: 0.10.1
   latest version: 0.10.1
Up to date!

Steps to reproduce

If you have a packages.yml with

packages:
  - git: "https://github.com/fishtown-analytics/dbt-utils"
    revision: master

Just run dbt deps as one user and then a different user.

I recommend using something along the lines of tempfile.mkdtemp instead, but of course there's so many ways this can be done.

@drewbanin
Copy link
Contributor

Thanks for the report @brian-bk!

I think we're going to make it possible to override the temp dir in the deps command. Your report is news to me, but we had a different issue for Windows users that you can check out over here.

I think that rather than trying to get really clever with temp dir creation, dbt should just let the user decide where the temporary dir should be, either with a profile config or a command line argument.

I'm super happy to discuss, and keen to hear if you think this is an appropriate fix for your issue.

@brian-bk
Copy link
Author

brian-bk commented Aug 1, 2018

At least for a git package, the project is cloned to /tmp/dbt-downloads/{git-hash} but then is right away moved to the project modules directory. Then the directory /tmp/dbt-downloads/ is empty and left behind without a purpose (and on linux owned by whoever ran dbt deps).

Unless there's a plan to use the temp directory as a cache, to me it makes the most sense to create a temp directory that is removed after use. That's why I recommend mkdtemp, because you can create something like /tmp/dbt_downloads_{random_hash} instead of the same location. Then it can be removed when dbt deps is done.

@atharvai
Copy link

atharvai commented Oct 19, 2018

On Windows in GitBash /tmp refers to user's TMP dir which is unique per user in GitBash. However there are other issues on Windows that mean if TMP dir is on a drive other than OS there are failures. here are some scenarios for investigation https://gist.github.com/atharvai/822ef904c29bbd304bac07bb37233dab
related Issue: #778

@jthandy jthandy added this to the Stephen Girard milestone Oct 25, 2018
@paulgraff
Copy link

paulgraff commented Oct 30, 2018

Wanted to drop my own experiences with this issue as well- my team is running dbt from Jenkins and typically has multiple jobs running dbt deps at the same time on a single worker . We switched to git dependencies from local ones yesterday, and saw a handful of build failures overnight with errors like:

2018-10-30T09:26:35.013+0000 + dbt deps --profiles-dir ../../.dbt --profile fulla
2018-10-30T09:26:37.820+0000 Installing git@github.com:fishtown-analytics/dbt-utils.git@0.1.17
2018-10-30T09:26:38.121+0000 Encountered an error:
2018-10-30T09:26:38.121+0000 Error checking out spec='0.1.17' for repo git@github.com:fishtown-analytics/dbt-utils.git
2018-10-30T09:26:38.121+0000 fatal: ambiguous argument 'tags/0.1.17': unknown revision or path not in the working tree.
2018-10-30T09:26:38.122+0000 Use '--' to separate paths from revisions, like this:
2018-10-30T09:26:38.122+0000 'git <command> [<revision>...] -- [<file>...]'

and

2018-10-30T01:16:10.373+0000 + dbt deps --profiles-dir ../../.dbt --profile fulla
2018-10-30T01:16:13.081+0000 Installing git@github.com:fishtown-analytics/dbt-utils.git@0.1.17
2018-10-30T01:16:13.182+0000 Encountered an error:
2018-10-30T01:16:13.182+0000 [Errno 2] No such file or directory: '/tmp/dbt-downloads/0a6a5f508a311be0d4ba05de66b74124'

Both of which (I think) are related to the same underlying issue of git running into permissions/concurrency problems.

@drewbanin
Copy link
Contributor

Thanks for the info @paulgraff! This is in our Stephen Girard milestone, which is in part focused on cleaning up the rough edges around dbt deps. Might ping you back here to help test once we make some progress on this front :)

@paulgraff
Copy link

paulgraff commented Oct 30, 2018

Awesome, thanks @drewbanin! Would be happy to help test this out- definitely ping me 😄 Also, let me know if there are any other logs I can pull or tests I can run on my end to help with investigation.

@beckjake
Copy link
Contributor

As of #1110, dbt uses the user's TEMP directory (via python's tempfile.mkdtemp, or the user's DBT_DOWNLOADS_DIR environment variable. That should resolve these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants