-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the relation cache for dbt compile
and dbt ls
#1705
Comments
dbt compile
and dbt ls
Kicking this out of the LMA milestone. The better version of this will:
|
Are contributions welcome on this? We're experiencing punishing compile times on Snowflake that would be alleviated by fixing this. |
hey @mcannamela - sure, we'd love a PR for this one! I think the big gain here is going to be lazily populating the relation cache. We want to make these introspective queries run when they're needed, not at startup. Right now, dbt is going to scan every schema referenced in your project, even if you're just doing something like:
In this case, we'd want to:
The locking piece comes in if you have more than one model running concurrently that's hitting the same information schema:
If both of these models are materialized in the same information schema namespace[1], dbt should only run the information schema query once, then return the results to both models. @beckjake there was some good reason why we didn't implement the original feature request (populating the relation cache) -- do you remember why that was? [1] this varies by warehouse. On postgres/redshift/snowflake, the information schema is accessed at the database level. On BigQuery, the information schema is accessed at the dataset level. |
I think it's because populating the cache is expensive, and many |
Could you clarify which compile invocations would not require it? From my reading, it seems like it's going to be hit any time I came to this issue via #1737 and that seemed like a win to me, since you could just pass the Is there some reason why populating the cache up front for all schemas is beneficial for the Not trying to be contrarian here just want to snuff out my own ignorance here before I or my team undertakes this. |
The trick is that not everything has to call
I think ideally they'd have the same lazy behavior. |
Got it, thanks. |
Adding this to the Octavius Catto milestone (for now, might not make it into that release). Folks can disable this if they're experiencing negative performance characteristics with |
…elation-caches dbt compile and dbt ls relation caches (#1705)
Describe the feature
In the
dbt run
command, dbt builds a relation cache. dbt should also build this cache when thedbt compile
anddbt ls
commands are invoked.Bonus: It would be even better if dbt lazily populated this cache. TBD if that takes place in this issue, or if we should make a separate issue for that.
Edited: Removed references to the rpc server
The text was updated successfully, but these errors were encountered: