Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update networkx dependency from 1.11 to 2.x #1496

Closed
heisencoder opened this issue Jun 1, 2019 · 6 comments
Closed

Update networkx dependency from 1.11 to 2.x #1496

heisencoder opened this issue Jun 1, 2019 · 6 comments
Labels
dependencies Changes to the version of dbt dependencies

Comments

@heisencoder
Copy link
Contributor

Issue

Issue description

I'm running dbt against networkx version 2.1 and am getting errors in linker.py. networkx 1.11 is over three years old now and dbt should start to use a more recent release in the 2.x series. I'm getting an error that I suspect may be due to dbt requiring an old networkx version:

File "dbt/linker.py", line 273, in _updated_graph
graph.add_node(node_id, data)
TypeError: add_node() takes exactly 2 arguments (3 given)

System information

The output of dbt --version:

installed version: 0.13.1
   latest version: 0.13.1

The operating system you're running on: An Debian variant running linux kernel 4.19.28.

The python version you're using (probably the output of python --version)
Python 2.7.16

Steps to reproduce

When I run dbt compile on a simple graph (2 seed files and an sql file that performs a JOIN of the two seed files), I get this error:

Traceback (most recent call last):
File "<embedded module '_launcher'>", line 149, in run_filename_as_main
File "<embedded module '_launcher'>", line 33, in _run_code_in_main
File "dbt/core/scripts/dbt.py", line 7, in
dbt.main.main(sys.argv[1:])
File "dbt/main.py", line 79, in main
results, succeeded = handle_and_check(args)
File "dbt/main.py", line 153, in handle_and_check
task, res = run_from_args(parsed)
File "dbt/main.py", line 209, in run_from_args
results = run_from_task(task, cfg, parsed)
File "dbt/main.py", line 217, in run_from_task
result = task.run()
File "dbt/task/runnable.py", line 242, in run
self._runtime_initialize()
File "dbt/task/runnable.py", line 51, in _runtime_initialize
self.linker = compile_manifest(self.config, self.manifest)
File "dbt/compilation.py", line 208, in compile_manifest
return compiler.compile(manifest)
File "dbt/compilation.py", line 199, in compile
self.write_graph_file(linker, manifest)
File "dbt/compilation.py", line 165, in write_graph_file
linker.write_graph(graph_path, manifest)
File "dbt/linker.py", line 258, in write_graph
out_graph = _updated_graph(self.graph, manifest)
File "dbt/linker.py", line 273, in _updated_graph
graph.add_node(node_id, data)
TypeError: add_node() takes exactly 2 arguments (3 given)
Running with dbt=0.13.1

@heisencoder
Copy link
Contributor Author

I believe this is due to the following change in networkx/classes/graph.py:

version 1.10:

408: def add_node(self, n, attr_dict=None, **attr):

version 2.1:

442: def add_node(self, node_for_adding, **attr):

It looks like it's possible to resolve this particular error by making this change to linker.py:

line 273: graph.add_node(node_id, **data)

(i.e. put '**' in front of the data parameter).

However, that fix just gets me to the next layer of the onion. Next error looks like this:

File "dbt/core/scripts/dbt.py", line 7, in
dbt.main.main(sys.argv[1:])
File "dbt/main.py", line 79, in main
results, succeeded = handle_and_check(args)
File "dbt/main.py", line 153, in handle_and_check
task, res = run_from_args(parsed)
File "dbt/main.py", line 209, in run_from_args
results = run_from_task(task, cfg, parsed)
File "dbt/main.py", line 217, in run_from_task
result = task.run()
File "dbt/task/runnable.py", line 242, in run
self._runtime_initialize()
File "dbt/task/runnable.py", line 56, in _runtime_initialize
selected_nodes)
File "dbt/linker.py", line 235, in as_graph_queue
return GraphQueue(new_graph, manifest)
File "dbt/linker.py", line 49, in init
self._find_new_additions()
File "dbt/linker.py", line 141, in _find_new_additions
for node, in_degree in self.graph.in_degree_iter():
AttributeError: 'DiGraph' object has no attribute 'in_degree_iter'

@heisencoder
Copy link
Contributor Author

Here's the networkx migration guide:

https://networkx.github.io/documentation/stable/release/migration_guide_from_1.x_to_2.0.html

According to the guide:

"Change any method with _iter in its name to the version without _iter. In v1 this replaces an iterator by a list, but the code will still work. In v2 this creates a view (which acts like an iterator)."

By changing in_degree_iter to in_degree, I was able to get the dbt compile command to work, although I don't know if there are other changes needed.

@drewbanin drewbanin added the dependencies Changes to the version of dbt dependencies label Jun 1, 2019
@drewbanin drewbanin added this to the Louisa May Alcott milestone Jun 1, 2019
@drewbanin
Copy link
Contributor

Thanks for the report @heisencoder - I think you're right, our networkx dep is due for an upgrade.

We've been blessed with stability in our dependencies - for a while (maybe a year ago) it seems like one dep of dbt was broken or in bad shape at any given time. We'd need to do some checking to make sure that, in addition to any code changes, the deps of networkx play nice with the other deps of dbt. It shouldn't be an issue - just wanted to note that here.

Last: You generally shouldn't see errors like this in functioning installations of dbt. As you've noted, dbt pins networkx to v1.11, so my guess is that pip warned you about the incompatible versions when dbt was installed alongside a 2.x version of networkx. If you're still having trouble using dbt, i'd recommend using a virtualenv to separate dbt and its dependencies from the rest of Python your environment.

Thanks!

@heisencoder
Copy link
Contributor Author

Since this is for our internal production environment, I can't use pip. Instead, we check-in the source code of all dependencies. As a result, I'm using different versions than what the setup.py requires and will need to apply local patches as needed to make all this work. I'm happy to upstream these patches.

For networkx, there is a way to make the code work with both 1.x and 2.x versions (although it will affect the compiled gpickle file, which can only be read by the matching version of networkx).

@drewbanin
Copy link
Contributor

Ok, thanks, a PR for this change would be great!

I think it's ok if .gpickle files produced by old versions of dbt don't work with newer versions of dbt. I'm sure a bump to networkx 2.x will be welcomed by anyone currently using the .gpickle file anyway! Long-term, we have plans to remove the .gpickle file, so I'm ok removing backwards compatibility around these files for sure.

@drewbanin
Copy link
Contributor

Fixed in #1509

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Changes to the version of dbt dependencies
Projects
None yet
Development

No branches or pull requests

2 participants