-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM error raised after subsequent calls to plc.bfs
with different SGGraph
s
#4067
Comments
plc.bfs
with different SGGraphs
plc.bfs
with different SGGraphs
plc.bfs
with different SGGraph
s
Did some analysis here. Here's what I found:
Can we define the desired behavior in this situation? If the user creates a graph (whether empty or not) and executes BFS with a starting vertex that is not actually a vertex in the graph, what is the desired behavior? Return an error indicating bad input parameters? Silently fail, returning the result of running BFS from no vertex? |
Thanks, Chuck. Here's what I noticed now: 1 & 2 make sense, but when I run the script with the entire first call and setup code commented out (everything from Given that, it seems like there could be a few issues:
>>> list(nx.bfs_edges(G,333))
Traceback (most recent call last):
File "/Projects/networkx/networkx/classes/graph.py", line 1354, in neighbors
return iter(self._adj[n])
KeyError: 333
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Projects/networkx/networkx/algorithms/traversal/breadth_first_search.py", line 218, in bfs_edges
yield from generic_bfs_edges(G, source, successors, depth_limit)
File "/Projects/networkx/networkx/algorithms/traversal/breadth_first_search.py", line 118, in generic_bfs_edges
next_parents_children = [(source, neighbors(source))]
File "/Projects/networkx/networkx/classes/graph.py", line 1356, in neighbors
raise NetworkXError(f"The node {n} is not in the graph.") from err
networkx.exception.NetworkXError: The node 333 is not in the graph. so raising/returning an error seems reasonable to me.
|
This particular OOB read is because of the invalid source vertex (which we never check). I can't think of other bad input cases that we're not checking... although I am admittedly bad at finding these a priori. I think adding this check should be sufficient. Correct input data should not result in an OOB error like this. OOB errors are very non-deterministic in their behavior. It's one of the things that makes them a challenge to isolate and debug. For this particular code - which is fairly straightforward, it's probably a function of what GPU model, device driver and CUDA runtime versions that you're running. Fundamentally, in my run when I access the OOB element I'm reading a I will add the fixes to the C API for BFS to detect this error. |
…valid vertex (#4077) * Added error check to be sure that the source vertex is a valid vertex in the graph. * Updated `nx_cugraph.Graph` class to create PLC graphs using `vertices_array` in order to include isolated vertices. This is now needed since the error check added in this PR prevents NetworkX tests from passing if isolated vertices are treated as invalid, so this change prevents that. * This resolves the problem that required the test workarounds done [here](#4029 (comment)) in [4029](#4029), so those workarounds have been removed in this PR. Closes #4067 Authors: - Chuck Hastings (https://github.com/ChuckHastings) - Rick Ratzel (https://github.com/rlratzel) Approvers: - Seunghwa Kang (https://github.com/seunghwak) - Ray Douglass (https://github.com/raydouglass) - Erik Welch (https://github.com/eriknw) URL: #4077
The following code produces the error shown below. The graph is too small to result in a legitimate OOM error, so a bug is likely causing something in
libcugraph
to allocate or attempt to allocate more memory than is available.This was initially discovered in #4029
The text was updated successfully, but these errors were encountered: