-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solving for large stage depths #370
Comments
This could be changed using https://trino.io/docs/current/admin/properties-query-management.html#query-max-stage-count. What is a sample stage count you are getting?
How to build something like that and how to use it? |
We are getting around 250 for stage depth, prior to materialization. |
@hovaesco here's an example of a dev tool we've made internally: https://gist.github.com/wseaton/0e0cfecb7421bbba6792222e8fec7cf6. There are some assumptions that it makes about the plan output that might not hold true in practice (the bit about "virtual" stages, for one), but it's been helpful for us in directionally identifying nodes in the dbt graph that contribute to sudden explosions in complexity. If we could get the direct number of stages from the query planner, and/or more insight into how the plan is generated that would be helpful. I'm also trying to think of ways to show this to users easier, not sure what makes the most sense. For now, a rendered graph seems to work. We are thinking about embedding this in CI tooling or potentially running this periodically to help as a diagnostic tool. |
Describe the feature
Certain dbt models that are combinations of views that have not themselves been materialized cause a lot of strain on the query planner and result in many stages getting made. The stage depth in starburst is currently set to 150, which means a developer or user could quickly run into this problem if they are joining or selecting from a composite data product.
Some type of mechanism to identify nodes that create large explosions in the query plan would be helpful for developers as a heuristic on when to think about materialization.
How this could be implemented:
The reason I add this here is that this is an issue specific using the features of dbt with starburst. I don't know if it can be solved in this package however.
Describe alternatives you've considered
There is a simple way to estimate model stage complexity, but it is a very bad estimate because it assumes all models are equally dense.
here is an excerpt from my justfile.
Who will benefit?
Anyone trying to leverage the modular nature of DBT in starburst views who is trying to figure out the best views to materialize.
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: