Node selection is quadratic in running time #1611

beckjake · 2019-07-16T18:10:09Z

Issue

The node selection algorithm is pretty obviously quadratic in running time and probably shouldn't be.

Issue description

In particular, we make N^2+N calls to is_selected_node, where N is the number of nodes in the graph. At smallish numbers of nodes (~100-200) that's no big deal (~3-5% of runtime), but it begins to take up a disproportionate amount of running time - at 4k nodes it goes to 25% of runtime.

I don't have a big problem with parsing and execution taking a long time for this number of nodes, but node selection? That's just silly!

Results

I expected linear-time node selection. Or better, maybe? I don't want to think too hard about this, I just want to get from "pathologically bad" to "not noticeable".

System information

This is on the current head of dev/louisa-may-alcott (72afd76)

OS/python isn't relevant

Steps to reproduce

Make a project with 1k models and 1k schema.yml files, each with 3 tests
Run the project dbt --single-threaded -r output.profile run
Go out for a coffee, it'll be a bit!
Check out your profile in snakeviz and see how enormous the "select_nodes" box is

The text was updated successfully, but these errors were encountered:

beckjake mentioned this issue Jul 17, 2019

Make node selection O(n) #1615

Merged

beckjake closed this as completed in #1615 Jul 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node selection is quadratic in running time #1611

Node selection is quadratic in running time #1611

beckjake commented Jul 16, 2019

Node selection is quadratic in running time #1611

Node selection is quadratic in running time #1611

Comments

beckjake commented Jul 16, 2019

Issue

Issue description

Results

System information

Steps to reproduce