Graph model's memory usage #340

mef · 2014-07-14T17:35:34Z

I have noticed that the memory usage of sigma.graph is very high. Processing large graphs leads to out of memory errors.

To give an idea, here is how sigma.js compares to node.js' graphjs module for a graph having 1210 nodes and 26409 edges, just to instantiate the graph:

# sigma.js when issue was opened
# process.memoryUsage().rss: 141.537.280

# sigma.js after a memory leak fixed in d821d8c
# process.memoryUsage().rss: 135.946.240

# graph.js
# process.memoryUsage().rss:  24.743.936

It's worth noting that graph.js also indexes the data for fast access.

I'd rather stick to using sigma.js because its API looks more straightforward to me. I have investigated the graph data and indexing structure in sigma.js

Here is the graph model's data stored for a very simple graph:

node n1 with one attribute attr:nodeAttr
node n2 having no attribute
edge e1 from n1 to n2 with one attribute attr:edgeAttr

{
  "nodesArray": [{ "attr": "nodeAttr", "id": "n1"},{ "id": "n2"}],
  "edgesArray": [{ "attr": "edgeAttr", "id": "e1", "source": "n1", "target": "n2"}],
  "nodesIndex": {"n1": { "attr": "nodeAttr", "id": "n1"}, "n2": { "id": "n2"}},
  "edgesIndex": {"e1": { "attr": "edgeAttr", "id": "e1", "source": "n1", "target": "n2"}},
  "inNeighborsIndex": {"n1": {}, "n2": { "n1": {"e1": {"id": "e1", "source": "n1", "target": "n2", "attr": "edgeAttr"} }}},
  "outNeighborsIndex": {"n1": { "n2": {"e1": {"id": "e1", "source": "n1", "target": "n2", "attr": "edgeAttr"} }}, "n2": {}},
  "allNeighborsIndex": {"n1": { "n2": {"e1": {"id": "e1", "source": "n1", "target": "n2", "attr": "edgeAttr"} }}, "n2": { "n1": {"e1": {"id": "e1", "source": "n1", "target": "n2", "attr": "edgeAttr"} }}},
  "inNeighborsCount": {"n1": 0, "n2": 1},
  "outNeighborsCount": {"n1": 1, "n2": 0},
  "allNeighborsCount": {"n1": 1, "n2": 1}
}

Observations:

nodesArray and nodesIndex are duplicate storage of the entire nodes dataset. Do we really need the array structure when we can iterate on nodesIndex object?
same remark as previous for edgesArray and edgesIndex
in nodesIndex, the objects' keys are the nodes Ids. Then we don't need to store the redundant id attribute.
same remark as previous for edgesIndex
inNeighborsIndex, outNeighborsIndex and allNeighborsIndex stores the whole edge data in the nested structure. Significant savings can be done by only storing the reference to the edges' id instead. Neighbors indexes can be used to check whether there is an edge between two nodes, and to lookup the id of the edge between two nodes. then in cases when it's needed a accessing edgesIndex by edge key can give the edges attributes.
inNeighborsIndex, outNeighborsIndex and allNeighborsIndex do store keys for nodes that have no edge. This is users extra space for nothing.
allNeighborsIndex is a concatenation of inNeighborsIndex and outNeighborsIndex. A method looking up both indexes and concatenating the result could do the same.
Neighbors count indexes: no redundancy, clean structure.

It seems to me that "fixing" points 1 to 5 above won't require lot of rework of the library, while and won't have affect the API too much, while providing significant gains.

Point 6 may require extra checks when using the indexes, I am not sure if its something suitable.

The effect of change suggested in point 7 should be further investigated.

For the sample graph used above, the graph model implementing the suggestions above would be so:

{
  "nodesIndex": {"n1": { "attr": "nodeAttr"}, "n2": {}},
  "edgesIndex": {"e1": { "attr": "edgeAttr", "source": "n1", "target": "n2"}},
  "inNeighborsIndex": {"n2": { "n1": "e1"}},
  "outNeighborsIndex": {"n1": { "n2": "e1"}},
  "inNeighborsCount": {"n1": 0, "n2": 1},
  "outNeighborsCount": {"n1": 1, "n2": 0},
  "allNeighborsCount": {"n1": 1, "n2": 1}
}

301 characters instead of 772 (spaces and line breaks excluded)

Is this something that the maintainers of sigma.js would consider worth being implemented?

Edit: added memory usage measured after memory leak fixed in d821d8c

The text was updated successfully, but these errors were encountered:

mef · 2014-07-18T15:34:03Z

related issue: #335

Yomguithereal · 2014-07-19T15:37:55Z

Hello @mef,

could you try to instantiate sigma with the clone setting set to false and tell me whether this changes anything?

mef · 2014-07-19T18:26:48Z

yes, have just tried again, I get the same memory consumption with option clone:false.

sheymann · 2014-08-06T16:23:26Z

It seems that indexes don't store a reference to the original object. Try this code:

sigma.classes.graph.addMethod('testLeak', function() {
  var e0 = this.allNeighborsIndex['n0']['n1']['e0'];
  // e0= {"id": "e0", "source": "n0", "target": "n1"}
  e0.newAttribute = 'I am not stored where you think.';
});

var s, g = {
  "nodes": [
    {
      "id": "n0",
      "label": "A node",
      "x": 0,
      "y": 0,
      "size": 3
    },
    {
      "id": "n1",
      "label": "Another node",
      "x": 3,
      "y": 1,
      "size": 2
    }
  ],
  "edges": [
    {
      "id": "e0",
      "source": "n0",
      "target": "n1"
    }
  ]
};

s = new sigma({
  graph: g,
  container: 'graph-container'
});

s.graph.testLeak();
console.log(s.graph.edges('e0').newAttribute); // undefined

mef · 2014-08-06T16:29:37Z

@sheymann I'm not sure to understand what you mean. what your code is doing looks to correspond to the expected behaviour.

My point was not that there's a memory leak, my poitn was that the indexes are un-necessarily fat.

Maybe you're suggesting another way to build indexes?

sheymann · 2014-08-06T16:36:06Z

@Yomguithereal will tell us if it's the expected behavior, but I don't think that indexes should duplicate the objects they are intended to reference. They should be pointers. And here references seem to be broken.

Maybe my understanding of JS is just wrong about refences, but then I'd be happy if someone can explain me what's wrong with my code.

Yomguithereal · 2014-08-06T17:14:21Z

Hello @sheymann and @mef,

There are indeed clear problems with those indexes and our intention with @jacomyal was to remove the neighbor indexes from core (but still keeping the counts for degre) to make a plugin out of it because it is really a drag for huge graphs (say 5000 nodes and 100 000 edges).

So, we would like to refactor and optimize those index when we'll extract them from core.

sheymann · 2014-08-07T12:08:16Z

When do you plan to work on it?

plugins.filter uses indexes and I've another plugin in preparation which also uses indexes.

Btw I've never been able to use graph.addIndex properly because the functions are executed before the attached graph methods: I can't update an index if I don't know the output of these methods!

jacomyal · 2014-08-07T15:00:59Z

@sheymann You're right, there is indeed a leak: The object to reference in addEdge is the one named validEdge, not the original edge argument (here is the leaked code). Thanks!

@sheymann

The edges referenced in the edges indexes where actually not the ones stored in the sigma.classes.graph instance, but the ones given to it. Leak found by @sheymann and showed in #340 (thanks!).

@sheymann

The edges referenced in the edges indexes where actually not the ones stored in the sigma.classes.graph instance, but the ones given to it. Leak found by @sheymann and showed in #340 (thanks!).

mef · 2014-08-25T17:15:54Z

Thanks for fixing the leak. I could observe an improvement in the memory usage, but we're still very high (cf. updated first issue comment).

So, we would like to refactor and optimize those index when we'll extract them from core.

I guess this is something you want to be done by one of the core developers. Any roadmap yet for this refactoring?

sheymann · 2014-09-16T18:25:42Z

Hey, a simple improvement would be to allow nodes and edges ids to be integers instead of strings only.

cf. jacomyal/sigma.js#340

tylorbayer · 2018-08-30T17:50:54Z

How can I use the sigma.noIndex.js if I wasn't originally using sigma.min.js?

stale · 2021-10-08T17:40:25Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rloth added a commit to moma/ProjectExplorer that referenced this issue Apr 20, 2017

IMPORTANT much lighter RAM footprint by removing sigma indexes

e4a5b31

cf. jacomyal/sigma.js#340

rloth added a commit to moma/ProjectExplorer that referenced this issue May 10, 2017

IMPORTANT much lighter RAM footprint by removing sigma indexes

89bd563

cf. jacomyal/sigma.js#340

stale bot added the wontfix label Oct 8, 2021

stale bot closed this as completed Oct 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph model's memory usage #340

Graph model's memory usage #340

mef commented Jul 14, 2014

mef commented Jul 18, 2014

Yomguithereal commented Jul 19, 2014

mef commented Jul 19, 2014

sheymann commented Aug 6, 2014

mef commented Aug 6, 2014

sheymann commented Aug 6, 2014

Yomguithereal commented Aug 6, 2014

sheymann commented Aug 7, 2014

jacomyal commented Aug 7, 2014

mef commented Aug 25, 2014

sheymann commented Sep 16, 2014

tylorbayer commented Aug 30, 2018

stale bot commented Oct 8, 2021

Graph model's memory usage #340

Graph model's memory usage #340

Comments

mef commented Jul 14, 2014

mef commented Jul 18, 2014

Yomguithereal commented Jul 19, 2014

mef commented Jul 19, 2014

sheymann commented Aug 6, 2014

mef commented Aug 6, 2014

sheymann commented Aug 6, 2014

Yomguithereal commented Aug 6, 2014

sheymann commented Aug 7, 2014

jacomyal commented Aug 7, 2014

mef commented Aug 25, 2014

sheymann commented Sep 16, 2014

tylorbayer commented Aug 30, 2018

stale bot commented Oct 8, 2021