Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up methods allow_multiple_edges and multiple_edges #33569

Closed
dcoudert opened this issue Mar 26, 2022 · 11 comments
Closed

Speed up methods allow_multiple_edges and multiple_edges #33569

dcoudert opened this issue Mar 26, 2022 · 11 comments

Comments

@dcoudert
Copy link
Contributor

Before (insane for large graphs)

sage: G = graphs.RandomGNP(10, .5, seed=20)
sage: G.allow_multiple_edges(True)
sage: timeit("G.multiple_edges()")
625 loops, best of 3: 99.9 μs per loop
sage: timeit("G.allow_multiple_edges(True); G.allow_multiple_edges(False)")
625 loops, best of 3: 128 μs per loop
sage:
sage: G = graphs.RandomGNP(1000, .5, seed=0)
sage: G.allow_multiple_edges(True)
sage: timeit("G.multiple_edges()")
5 loops, best of 3: 44.9 s per loop
sage: timeit("G.allow_multiple_edges(True); G.allow_multiple_edges(False)")
5 loops, best of 3: 78.6 s per loop

After

sage: G = graphs.RandomGNP(10, .5, seed=20)
sage: G.allow_multiple_edges(True)
sage: timeit("G.multiple_edges()")
625 loops, best of 3: 21.3 μs per loop
sage: timeit("G.allow_multiple_edges(True); G.allow_multiple_edges(False)")
625 loops, best of 3: 17 μs per loop
sage:
sage: G = graphs.RandomGNP(1000, .5, seed=0)
sage: G.allow_multiple_edges(True)
sage: timeit("G.multiple_edges()")
5 loops, best of 3: 270 ms per loop
sage: timeit("G.allow_multiple_edges(True); G.allow_multiple_edges(False)")
5 loops, best of 3: 247 ms per loop

CC: @kliem @tscrim

Component: graph theory

Author: David Coudert

Branch/Commit: f3fd774

Reviewer: Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/33569

@dcoudert dcoudert added this to the sage-9.6 milestone Mar 26, 2022
@dcoudert
Copy link
Contributor Author

comment:1

It can certainly be even faster if done in the backends, but it's already an improvement.


New commits:

02cfa0ctrac #33569: speed up allow_multiple_edges and multiple_edges

@dcoudert
Copy link
Contributor Author

Commit: 02cfa0c

@dcoudert

This comment has been minimized.

@dcoudert
Copy link
Contributor Author

Branch: public/graphs/33569_speedup

@tscrim
Copy link
Collaborator

tscrim commented Mar 27, 2022

comment:3

While this is undeniably faster, this seems to be much worse in memory. The current code just needs all of the multiedges and the local information around a vertex I believe. However, the proposed code requires essentially making a copy of the graph as it needs to hold all of the edges in memory (plus a second copy for all of the keys of the dict). Granted, the current code is too slow to be useful on such large graphs, so this probably is the thing we want to do. I am wondering if we can get a more memory efficient implementation.

One thing that could be done to make this even faster would be to use the if f in edges: check as

        edges = {}
        multi_edges = set()
        for e in self.edge_iterator(labels=labels):
            f = edge_id(e[0], e[1])
            if f in edges:
                multi_edges.add(f)
                edges[f].append(e)
            else:
                edges[f] = [e]
        multi_edges = sum(edges[f] for f in multi_edges, [])

This would save an iteration over all of the edges of the graph.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 27, 2022

Branch pushed to git repo; I updated commit sha1. New commits:

f3fd774trac #33569: alternative implementation of multiple_edges

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 27, 2022

Changed commit from 02cfa0c to f3fd774

@dcoudert
Copy link
Contributor Author

comment:5

I have pushed an alternative implementation of method multiple_edges. It uses less memory and is slightly slower than the previous proposal, but this is acceptable. However, I had to distinguish cases.

sage: set_random_seed(0)
sage: G = graphs.RandomGNP(1000, .5, seed=0)
sage: G.size()
250330
sage: D = DiGraph(G)
sage: D.size()
500660
sage: G.allow_multiple_edges(True)
sage: %time len(G.multiple_edges())
CPU times: user 368 ms, sys: 3.6 ms, total: 371 ms
Wall time: 373 ms
0
sage: G.add_edges(G.edges())
sage: %time len(G.multiple_edges())
CPU times: user 527 ms, sys: 25.4 ms, total: 552 ms
Wall time: 556 ms
500660
sage: %time len(D.multiple_edges(to_undirected=True))
CPU times: user 721 ms, sys: 23.1 ms, total: 744 ms
Wall time: 751 ms
500660
sage: D.allow_multiple_edges(True)
sage: %time len(D.multiple_edges(to_undirected=False))
CPU times: user 685 ms, sys: 5.72 ms, total: 691 ms
Wall time: 696 ms
0
sage: D.add_edges(D.edges())
sage: D.size()
1001320
sage: %time len(D.multiple_edges(to_undirected=True))
CPU times: user 1.33 s, sys: 48.2 ms, total: 1.38 s
Wall time: 1.39 s
1001320
sage: %time len(D.multiple_edges(to_undirected=False))
CPU times: user 923 ms, sys: 50 ms, total: 973 ms
Wall time: 982 ms
1001320

On the way I discovered that by default graphs.RandomGNP ignores the random seed... not good. I have opened ticket #33579 to fix that.

@tscrim
Copy link
Collaborator

tscrim commented Mar 27, 2022

comment:6

Thank you. I think this is a good trade-off between CPU and memory as well.

@tscrim
Copy link
Collaborator

tscrim commented Mar 27, 2022

Reviewer: Travis Scrimshaw

@vbraun
Copy link
Member

vbraun commented Apr 2, 2022

Changed branch from public/graphs/33569_speedup to f3fd774

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants