You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When loading a graph into d3graph users must provide a dense adjacency matrix. This is inconvenient because most real-world graphs are sparse, and the dense representation is memory-ineffiecient.
Furthermore, the d3graph class saves a copy of the adjacency matrix for its internal representation of the data when the .graph() method is called - but everytime this self.adjmat is used, it seems to have been more natural to have stored it in a different structure.
E.g set_edge_properties first converts the matrix into a dict (which is a sparse representation) and get_cluste_color effectively converts to an edgelist as far as I can tell (also sparse). Both of these operations are fairly inefficient and could have been avoided had a different structure been used to begin with. Adjacency matrices of graphs tend to be most useful when the algorithm we want to apply can be expressed as linear algebra, e.g. shortest paths, pagerank or traversals. Even in most of these situations, sparse matrices are usually the go-to for scalable solutions.
Impact
d3graph is limiting the sizes of graphs it supports by enforcing an inefficient input type.
d3graph is increasing its runtime due applying inefficient operations on this dense data structure when generating a figure
Possible Solutions
My preferred suggestion: since we already have networkx as a dependency, it would be wise to delegate the responsibility of storing the graph structure to that package, since:
it's a trusted, respected and mature project - no doubt they have applied more thought on efficiency than we could here. It's also unlikey to find bugs
its API is very stable
users will be familiar with the API
NetworkX already has great conversion functions to/from other data structures, meaning less inefficient conversions and boilerplate here.
would make the python portion of the code much shorter and more readable (adjmat2* methods could be removed)
The text was updated successfully, but these errors were encountered:
Summary
When loading a graph into d3graph users must provide a dense adjacency matrix. This is inconvenient because most real-world graphs are sparse, and the dense representation is memory-ineffiecient.
Furthermore, the
d3graph
class saves a copy of the adjacency matrix for its internal representation of the data when the.graph()
method is called - but everytime thisself.adjmat
is used, it seems to have been more natural to have stored it in a different structure.E.g
set_edge_properties
first converts the matrix into adict
(which is a sparse representation) andget_cluste_color
effectively converts to an edgelist as far as I can tell (also sparse). Both of these operations are fairly inefficient and could have been avoided had a different structure been used to begin with. Adjacency matrices of graphs tend to be most useful when the algorithm we want to apply can be expressed as linear algebra, e.g. shortest paths, pagerank or traversals. Even in most of these situations, sparse matrices are usually the go-to for scalable solutions.Impact
Possible Solutions
My preferred suggestion: since we already have
networkx
as a dependency, it would be wise to delegate the responsibility of storing the graph structure to that package, since:adjmat2*
methods could be removed)The text was updated successfully, but these errors were encountered: