Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph model's memory usage #340

Closed
mef opened this issue Jul 14, 2014 · 13 comments
Closed

Graph model's memory usage #340

mef opened this issue Jul 14, 2014 · 13 comments
Labels

Comments

@mef
Copy link
Contributor

mef commented Jul 14, 2014

I have noticed that the memory usage of sigma.graph is very high. Processing large graphs leads to out of memory errors.

To give an idea, here is how sigma.js compares to node.js' graphjs module for a graph having 1210 nodes and 26409 edges, just to instantiate the graph:

# sigma.js when issue was opened
# process.memoryUsage().rss: 141.537.280

# sigma.js after a memory leak fixed in d821d8c
# process.memoryUsage().rss: 135.946.240

# graph.js
# process.memoryUsage().rss:  24.743.936

It's worth noting that graph.js also indexes the data for fast access.

I'd rather stick to using sigma.js because its API looks more straightforward to me. I have investigated the graph data and indexing structure in sigma.js

Here is the graph model's data stored for a very simple graph:

  • node n1 with one attribute attr:nodeAttr
  • node n2 having no attribute
  • edge e1 from n1 to n2 with one attribute attr:edgeAttr
{
  "nodesArray": [{ "attr": "nodeAttr", "id": "n1"},{ "id": "n2"}],
  "edgesArray": [{ "attr": "edgeAttr", "id": "e1", "source": "n1", "target": "n2"}],
  "nodesIndex": {"n1": { "attr": "nodeAttr", "id": "n1"}, "n2": { "id": "n2"}},
  "edgesIndex": {"e1": { "attr": "edgeAttr", "id": "e1", "source": "n1", "target": "n2"}},
  "inNeighborsIndex": {"n1": {}, "n2": { "n1": {"e1": {"id": "e1", "source": "n1", "target": "n2", "attr": "edgeAttr"} }}},
  "outNeighborsIndex": {"n1": { "n2": {"e1": {"id": "e1", "source": "n1", "target": "n2", "attr": "edgeAttr"} }}, "n2": {}},
  "allNeighborsIndex": {"n1": { "n2": {"e1": {"id": "e1", "source": "n1", "target": "n2", "attr": "edgeAttr"} }}, "n2": { "n1": {"e1": {"id": "e1", "source": "n1", "target": "n2", "attr": "edgeAttr"} }}},
  "inNeighborsCount": {"n1": 0, "n2": 1},
  "outNeighborsCount": {"n1": 1, "n2": 0},
  "allNeighborsCount": {"n1": 1, "n2": 1}
}

Observations:

  1. nodesArray and nodesIndex are duplicate storage of the entire nodes dataset. Do we really need the array structure when we can iterate on nodesIndex object?
  2. same remark as previous for edgesArray and edgesIndex
  3. in nodesIndex, the objects' keys are the nodes Ids. Then we don't need to store the redundant id attribute.
  4. same remark as previous for edgesIndex
  5. inNeighborsIndex, outNeighborsIndex and allNeighborsIndex stores the whole edge data in the nested structure. Significant savings can be done by only storing the reference to the edges' id instead. Neighbors indexes can be used to check whether there is an edge between two nodes, and to lookup the id of the edge between two nodes. then in cases when it's needed a accessing edgesIndex by edge key can give the edges attributes.
  6. inNeighborsIndex, outNeighborsIndex and allNeighborsIndex do store keys for nodes that have no edge. This is users extra space for nothing.
  7. allNeighborsIndex is a concatenation of inNeighborsIndex and outNeighborsIndex. A method looking up both indexes and concatenating the result could do the same.
  8. Neighbors count indexes: no redundancy, clean structure.

It seems to me that "fixing" points 1 to 5 above won't require lot of rework of the library, while and won't have affect the API too much, while providing significant gains.

Point 6 may require extra checks when using the indexes, I am not sure if its something suitable.

The effect of change suggested in point 7 should be further investigated.

For the sample graph used above, the graph model implementing the suggestions above would be so:

{
  "nodesIndex": {"n1": { "attr": "nodeAttr"}, "n2": {}},
  "edgesIndex": {"e1": { "attr": "edgeAttr", "source": "n1", "target": "n2"}},
  "inNeighborsIndex": {"n2": { "n1": "e1"}},
  "outNeighborsIndex": {"n1": { "n2": "e1"}},
  "inNeighborsCount": {"n1": 0, "n2": 1},
  "outNeighborsCount": {"n1": 1, "n2": 0},
  "allNeighborsCount": {"n1": 1, "n2": 1}
}

301 characters instead of 772 (spaces and line breaks excluded)

Is this something that the maintainers of sigma.js would consider worth being implemented?

Edit: added memory usage measured after memory leak fixed in d821d8c

@mef
Copy link
Contributor Author

mef commented Jul 18, 2014

related issue: #335

@Yomguithereal
Copy link
Collaborator

Hello @mef,

could you try to instantiate sigma with the clone setting set to false and tell me whether this changes anything?

@mef
Copy link
Contributor Author

mef commented Jul 19, 2014

yes, have just tried again, I get the same memory consumption with option clone:false.

@sheymann
Copy link
Contributor

sheymann commented Aug 6, 2014

It seems that indexes don't store a reference to the original object. Try this code:

sigma.classes.graph.addMethod('testLeak', function() {
  var e0 = this.allNeighborsIndex['n0']['n1']['e0'];
  // e0= {"id": "e0", "source": "n0", "target": "n1"}
  e0.newAttribute = 'I am not stored where you think.';
});

var s, g = {
  "nodes": [
    {
      "id": "n0",
      "label": "A node",
      "x": 0,
      "y": 0,
      "size": 3
    },
    {
      "id": "n1",
      "label": "Another node",
      "x": 3,
      "y": 1,
      "size": 2
    }
  ],
  "edges": [
    {
      "id": "e0",
      "source": "n0",
      "target": "n1"
    }
  ]
};

s = new sigma({
  graph: g,
  container: 'graph-container'
});

s.graph.testLeak();
console.log(s.graph.edges('e0').newAttribute); // undefined

@mef
Copy link
Contributor Author

mef commented Aug 6, 2014

@sheymann I'm not sure to understand what you mean. what your code is doing looks to correspond to the expected behaviour.

My point was not that there's a memory leak, my poitn was that the indexes are un-necessarily fat.

Maybe you're suggesting another way to build indexes?

@sheymann
Copy link
Contributor

sheymann commented Aug 6, 2014

@Yomguithereal will tell us if it's the expected behavior, but I don't think that indexes should duplicate the objects they are intended to reference. They should be pointers. And here references seem to be broken.

Maybe my understanding of JS is just wrong about refences, but then I'd be happy if someone can explain me what's wrong with my code.

@Yomguithereal
Copy link
Collaborator

Hello @sheymann and @mef,

There are indeed clear problems with those indexes and our intention with @jacomyal was to remove the neighbor indexes from core (but still keeping the counts for degre) to make a plugin out of it because it is really a drag for huge graphs (say 5000 nodes and 100 000 edges).

So, we would like to refactor and optimize those index when we'll extract them from core.

@sheymann
Copy link
Contributor

sheymann commented Aug 7, 2014

When do you plan to work on it?

plugins.filter uses indexes and I've another plugin in preparation which also uses indexes.

Btw I've never been able to use graph.addIndex properly because the functions are executed before the attached graph methods: I can't update an index if I don't know the output of these methods!

@jacomyal
Copy link
Owner

jacomyal commented Aug 7, 2014

@sheymann You're right, there is indeed a leak: The object to reference in addEdge is the one named validEdge, not the original edge argument (here is the leaked code). Thanks!

jacomyal added a commit that referenced this issue Aug 13, 2014
The edges referenced in the edges indexes where actually not the ones stored in the sigma.classes.graph instance, but the ones given to it.
Leak found by @sheymann and showed in #340 (thanks!).
sheymann referenced this issue in Linkurious/linkurious.js Aug 13, 2014
The edges referenced in the edges indexes where actually not the ones stored in the sigma.classes.graph instance, but the ones given to it.
Leak found by @sheymann and showed in #340 (thanks!).
@mef
Copy link
Contributor Author

mef commented Aug 25, 2014

Thanks for fixing the leak. I could observe an improvement in the memory usage, but we're still very high (cf. updated first issue comment).

So, we would like to refactor and optimize those index when we'll extract them from core.

I guess this is something you want to be done by one of the core developers. Any roadmap yet for this refactoring?

@sheymann
Copy link
Contributor

Hey, a simple improvement would be to allow nodes and edges ids to be integers instead of strings only.

rloth added a commit to moma/ProjectExplorer that referenced this issue Apr 20, 2017
rloth added a commit to moma/ProjectExplorer that referenced this issue May 10, 2017
@tylorbayer
Copy link

How can I use the sigma.noIndex.js if I wasn't originally using sigma.min.js?

@stale
Copy link

stale bot commented Oct 8, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 8, 2021
@stale stale bot closed this as completed Oct 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants