-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graph model's memory usage #340
Comments
related issue: #335 |
Hello @mef, could you try to instantiate sigma with the |
yes, have just tried again, I get the same memory consumption with option |
It seems that indexes don't store a reference to the original object. Try this code: sigma.classes.graph.addMethod('testLeak', function() {
var e0 = this.allNeighborsIndex['n0']['n1']['e0'];
// e0= {"id": "e0", "source": "n0", "target": "n1"}
e0.newAttribute = 'I am not stored where you think.';
});
var s, g = {
"nodes": [
{
"id": "n0",
"label": "A node",
"x": 0,
"y": 0,
"size": 3
},
{
"id": "n1",
"label": "Another node",
"x": 3,
"y": 1,
"size": 2
}
],
"edges": [
{
"id": "e0",
"source": "n0",
"target": "n1"
}
]
};
s = new sigma({
graph: g,
container: 'graph-container'
});
s.graph.testLeak();
console.log(s.graph.edges('e0').newAttribute); // undefined |
@sheymann I'm not sure to understand what you mean. what your code is doing looks to correspond to the expected behaviour. My point was not that there's a memory leak, my poitn was that the indexes are un-necessarily fat. Maybe you're suggesting another way to build indexes? |
@Yomguithereal will tell us if it's the expected behavior, but I don't think that indexes should duplicate the objects they are intended to reference. They should be pointers. And here references seem to be broken. Maybe my understanding of JS is just wrong about refences, but then I'd be happy if someone can explain me what's wrong with my code. |
There are indeed clear problems with those indexes and our intention with @jacomyal was to remove the neighbor indexes from core (but still keeping the counts for degre) to make a plugin out of it because it is really a drag for huge graphs (say 5000 nodes and 100 000 edges). So, we would like to refactor and optimize those index when we'll extract them from core. |
When do you plan to work on it? plugins.filter uses indexes and I've another plugin in preparation which also uses indexes. Btw I've never been able to use |
The edges referenced in the edges indexes where actually not the ones stored in the sigma.classes.graph instance, but the ones given to it. Leak found by @sheymann and showed in #340 (thanks!).
Thanks for fixing the leak. I could observe an improvement in the memory usage, but we're still very high (cf. updated first issue comment).
I guess this is something you want to be done by one of the core developers. Any roadmap yet for this refactoring? |
Hey, a simple improvement would be to allow nodes and edges ids to be integers instead of strings only. |
How can I use the sigma.noIndex.js if I wasn't originally using sigma.min.js? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I have noticed that the memory usage of sigma.graph is very high. Processing large graphs leads to
out of memory
errors.To give an idea, here is how sigma.js compares to node.js' graphjs module for a graph having 1210 nodes and 26409 edges, just to instantiate the graph:
It's worth noting that graph.js also indexes the data for fast access.
I'd rather stick to using sigma.js because its API looks more straightforward to me. I have investigated the graph data and indexing structure in sigma.js
Here is the graph model's data stored for a very simple graph:
n1
with one attributeattr:nodeAttr
n2
having no attributee1
fromn1
ton2
with one attributeattr:edgeAttr
Observations:
nodesArray
andnodesIndex
are duplicate storage of the entire nodes dataset. Do we really need the array structure when we can iterate onnodesIndex
object?edgesArray
andedgesIndex
nodesIndex
, the objects' keys are the nodes Ids. Then we don't need to store the redundantid
attribute.edgesIndex
inNeighborsIndex
,outNeighborsIndex
andallNeighborsIndex
stores the whole edge data in the nested structure. Significant savings can be done by only storing the reference to the edges' id instead. Neighbors indexes can be used to check whether there is an edge between two nodes, and to lookup the id of the edge between two nodes. then in cases when it's needed a accessingedgesIndex
by edge key can give the edges attributes.inNeighborsIndex
,outNeighborsIndex
andallNeighborsIndex
do store keys for nodes that have no edge. This is users extra space for nothing.allNeighborsIndex
is a concatenation ofinNeighborsIndex
andoutNeighborsIndex
. A method looking up both indexes and concatenating the result could do the same.It seems to me that "fixing" points 1 to 5 above won't require lot of rework of the library, while and won't have affect the API too much, while providing significant gains.
Point 6 may require extra checks when using the indexes, I am not sure if its something suitable.
The effect of change suggested in point 7 should be further investigated.
For the sample graph used above, the graph model implementing the suggestions above would be so:
301 characters instead of 772 (spaces and line breaks excluded)
Is this something that the maintainers of sigma.js would consider worth being implemented?
Edit: added memory usage measured after memory leak fixed in d821d8c
The text was updated successfully, but these errors were encountered: