-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbt compile runs in 7+ minutes #243
Comments
Hi @dot2dotseurat, thanks for the report. I tried the project in my local and it took like 4 mins even without any tables created, and I guess it will be even worse with tables created. I guess this is not an adapter issue, but how dbt-core creates metadata which needs to run queries for each schema, IIUC. |
Hi @ueshin, Thank you so much for the quick reply. As I understand it, So I think you are right that it's something about querying metadata but it's odd that Anyway, I am happy to open an issue with dbt-core if @jtcohen6 agrees that's the better place for this inquiry. Thanks again for the quick response. |
Thanks for tagging me in @ueshin - sorry for the delay. @dot2dotseurat This isn't quite right:
While dbt doesn't require a database connection to parse your project, or list the resources from it, At the same time, I understand that cache population is a time-consuming step, especially on Spark/Databricks today. The good news is that #231 should (hopefully) speed up cache population significantly; I'm guessing that change will be landing in the next release of dbt-databricks (v1.4). I also just opened a |
@jtcohen6 Thanks for the reply.
Unfortunately, #231 won't help this case. When creating the cache, |
This is true, the overall mechanism remains the same. The important difference is speed. In our testing several months ago, in the issues & draft PRs that eventually led to #231, we found that
Correct, it's the same. On other adapters, the biggest difference is that the metadata queries powering cache population are quite fast. On Even then, we do run into performance issues at serious scale (e.g. dbt-labs/dbt-snowflake#83, dbt-labs/dbt-bigquery#205). Hence the interest in allowing users to either do partial caching (supported as an experimental config), or skip cache population entirely (new issue/proposal I linked above). |
@dot2dotseurat Is it possible to upgrade dbt to 1.3? |
@ueshin I gave it a shot and I didn't see an improvement. :/ |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue. |
Describe the bug
A clear and concise description of what the bug is. What command did you run? What happened?
running dbt compile routine takes upwards of 7 minutes and sometimes greater than 10 minutes. Project size listed below by dbt logs.
Steps To Reproduce
In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.
The project is open sourced here. The issue can be reproduced by cloning the repo and running dbt compile after installing the pipenv included in the repo.
Expected behavior
A clear and concise description of what you expected to happen.
I would expect compiling to run closed to 30 seconds based on what I've seen for other dbt projects.
Screenshots and log output
If applicable, add screenshots or log output to help explain your problem.
System information
The output of
dbt --version
:The operating system you're using:
Mac Big Sur version 11.5.2, M1 chip
The output of
python --version
:Python 3.9.15
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: