Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

Added a function rem_vertices! #1047

Merged
merged 10 commits into from
Oct 12, 2018

Conversation

simonschoelly
Copy link
Contributor

See #1043

This adds a function rem_vertices!(g, vertex_list; keep_order=false) -> vmap. This function allows to remove multiple vertices from a graph. It returns a vector vmap that maps the vertices from the modified graph to the vertices in the unmodified graph. A flag keep_order can be set to true to ensure that the order of the vertices is not changed.

@codecov
Copy link

codecov bot commented Oct 9, 2018

Codecov Report

Merging #1047 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1047      +/-   ##
==========================================
+ Coverage   99.81%   99.82%   +<.01%     
==========================================
  Files          86       86              
  Lines        2746     2821      +75     
==========================================
+ Hits         2741     2816      +75     
  Misses          5        5

@sbromberger
Copy link
Owner

Thanks. From a quick read, this looks like it's O(V+E+something). Is my estimate correct?

@simonschoelly
Copy link
Contributor Author

Yeah that seems about right. There is a note in the comments on how it would be possible to speed it up a bit more, but from running the code it seems to be quite faster than I expected anyway. At the moment it seems that even for removing a single vertex it might be faster than rem_vertex!.

@sbromberger
Copy link
Owner

At the moment it seems that even for removing a single vertex it might be faster than rem_vertex!.

This is surprising to me. What size graph are you trying this on?

@simonschoelly
Copy link
Contributor Author

So either I had a very special kind of graph, where that was indeed the case, or I must have counted the number of digits wrong, because I cannot reproduce my claims at all. Looks more as if rem_vertex! is roughly ten times faster for a single vertex.

julia> g = erdos_renyi(10^4, 0.1)
{10000, 4998825} undirected simple Int64 graph

julia> g2 = copy(g); @time rem_vertex!(g2, 5000)
  0.002242 seconds (444 allocations: 6.743 MiB)

julia> g2 = copy(g); @time rem_vertices!(g2, [5000], keep_order=false)
  0.055374 seconds (18 allocations: 157.094 KiB)

julia> g2 = copy(g); @time rem_vertices!(g2, [5000], keep_order=true)
  0.012389 seconds (18 allocations: 157.094 KiB)

@sbromberger
Copy link
Owner

Those benchmarks look more correct. Swap-n-pop is inherently efficient as it requires one move, and an O(1) resize.

Does rem_vertices! scale linearly with the number of vertices removed?

@simonschoelly
Copy link
Contributor Author

No it does not. The reason is, that in any case, the algorithm will check every edge. And if the order is kept, then this is also necessary most of the time.

julia> g = erdos_renyi(10^4, 0.1);
julia> a = randperm(10^4);

# 1 vertex
julia> g2 = copy(g); @time rem_vertices!(g2, a[1:1], keep_order=false);
  0.054921 seconds (19 allocations: 157.125 KiB)
julia> g2 = copy(g); @time rem_vertices!(g2, a[1:1], keep_order=true);
  0.009877 seconds (19 allocations: 157.125 KiB)

# 10 vertices
julia> g2 = copy(g); @time rem_vertices!(g2, a[1:10], keep_order=false);
  0.078055 seconds (19 allocations: 157.188 KiB)
julia> g2 = copy(g); @time rem_vertices!(g2, a[1:10], keep_order=true);
  0.011177 seconds (19 allocations: 157.188 KiB)

# 100 vertices
julia> g2 = copy(g); @time rem_vertices!(g2, a[1:100], keep_order=false);
  0.122671 seconds (19 allocations: 157.938 KiB)
julia> g2 = copy(g); @time rem_vertices!(g2, a[1:100], keep_order=true);
  0.012469 seconds (19 allocations: 157.938 KiB)

# 1000 vertices
julia> g2 = copy(g); @time rem_vertices!(g2, a[1:1000], keep_order=false);
  0.133891 seconds (19 allocations: 165.000 KiB)
julia> g2 = copy(g); @time rem_vertices!(g2, a[1:1000], keep_order=true);
  0.020724 seconds (19 allocations: 165.000 KiB)

# 5000 vertices
julia> g2 = copy(g); @time rem_vertices!(g2, a[1:5000], keep_order=false);
  0.081602 seconds (21 allocations: 196.156 KiB)
julia> g2 = copy(g); @time rem_vertices!(g2, a[1:5000], keep_order=true);
  0.034571 seconds (21 allocations: 196.156 KiB)

# 10000 vertices
julia> g2 = copy(g); @time rem_vertices!(g2, a[1:10000], keep_order=false);
  0.012222 seconds (20 allocations: 235.219 KiB)
julia> g2 = copy(g); @time rem_vertices!(g2, a[1:10000], keep_order=true);
  0.012641 seconds (20 allocations: 235.219 KiB)

@sbromberger
Copy link
Owner

Those benchmarks look suspicious. Why should removing 10k vertices take 1/8 the time that removing 5k vertices takes, and why should removing 5k vertices take half the time that removing 1k takes?

@simonschoelly
Copy link
Contributor Author

The algorithm works in 3 phases:

  • In the first phase, we calculate a map from the old vertex labels to the new ones.
  • In the second phase, we move the lists in fadjlist to their right position and then resize fadjlist
  • In the third phase, we go over all the left over lists, and remove/relabel the vertices in there.
    (also at some point we calculate the number of edges that get removed)

So if we remove a lot of vertices, then in the third phase, there will be less lists that we have to process. For the 10k case, there will be no lists left at all.

@sbromberger
Copy link
Owner

Thanks for the explanation. That makes sense.

function rem_vertices!(g::SimpleDiGraph{T},
vs::AbstractVector{T};
keep_order::Bool=false
) where {T}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're parameterizing SimpleDiGraph, then you should constrain T <: Integer. But more importantly, do you really want to insist that the vertex list is the same type as the graph eltype? Generally we just use Integer and cast internally as appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't SimpleDiGraph{T} already restricted to T <: Integer by the definition of that datatype? I will change the vector.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but our convention (to date) is to constrain parameters for clarity. I'm not opposed to changing that convention if it makes sense; this was just a suggestion.

# Sort and filter the vertices that we want to remove
remove = sort(vs)
unique!(remove)
lo, hi = extrema(remove)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since remove is sorted (I think unique preserves the ordering), then lo, hi = (remove[1], remove[end]) is much more efficient than extrema:

julia> a = rand(Int, 100_000_000);

julia> sort!(a);

julia> @benchmark extrema($a)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     53.086 ms (0.00% GC)
  median time:      58.190 ms (0.00% GC)
  mean time:        58.347 ms (0.00% GC)
  maximum time:     65.753 ms (0.00% GC)
  --------------
  samples:          86
  evals/sample:     1

julia> @benchmark ($a[1], $a[end])
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.088 ns (0.00% GC)
  median time:      2.096 ns (0.00% GC)
  mean time:        2.200 ns (0.00% GC)
  maximum time:     34.643 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

remove = sort(vs)
unique!(remove)
lo, hi = extrema(remove)
(one(T) <= lo && hi <= n) ||
Copy link
Owner

@sbromberger sbromberger Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 <= lo <= hi <= n works also.

if keep_order
# traverse the vertex list and shift if a vertex gets removed
i = 1
Δ = 0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like Δ is always one behind i. Why define it at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I don't need that.

@sbromberger sbromberger merged commit dcbc9b2 into sbromberger:master Oct 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants