Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node selection is quadratic in running time #1611

Closed
beckjake opened this issue Jul 16, 2019 · 0 comments · Fixed by #1615
Closed

Node selection is quadratic in running time #1611

beckjake opened this issue Jul 16, 2019 · 0 comments · Fixed by #1615

Comments

@beckjake
Copy link
Contributor

Issue

The node selection algorithm is pretty obviously quadratic in running time and probably shouldn't be.

Issue description

In particular, we make N^2+N calls to is_selected_node, where N is the number of nodes in the graph. At smallish numbers of nodes (~100-200) that's no big deal (~3-5% of runtime), but it begins to take up a disproportionate amount of running time - at 4k nodes it goes to 25% of runtime.

I don't have a big problem with parsing and execution taking a long time for this number of nodes, but node selection? That's just silly!

Results

I expected linear-time node selection. Or better, maybe? I don't want to think too hard about this, I just want to get from "pathologically bad" to "not noticeable".

System information

This is on the current head of dev/louisa-may-alcott (72afd76)

OS/python isn't relevant

Steps to reproduce

  1. Make a project with 1k models and 1k schema.yml files, each with 3 tests
  2. Run the project dbt --single-threaded -r output.profile run
  3. Go out for a coffee, it'll be a bit!
  4. Check out your profile in snakeviz and see how enormous the "select_nodes" box is
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant