WIP: Normalized graph cut #736

tejus-gupta · 2017-08-27T22:37:46Z

Implementation of recursive two-way normalized graph cut as described in this paper - "Normalized Cuts and Image Segmentation" by Shi and Malik. I had originally submitted a pull request at ImageSegmentation.jl but since this is more generally applicable, I was advised to submit a PR here. Since adjacency matrix for grid connectivity based graphs are sparse, some parts of the functions only work for sparse matrices. I will update the function to make it work for any array-like input.

sbromberger · 2017-08-28T02:10:00Z

Hi @tejus-gupta - thanks for this. If you're using SimpleWeightedGraphs, then you'll want to make this a PR to SimpleWeightedGraphs.jl. If it can be generalized to AbstractGraphs, then it belongs here.

Edited to add: In general, functions that can take weights can be abstracted to AbstractGraph by doing something like we do in dijkstra_shortest_paths: dijkstra_shortest_paths(g, srcs, distmx=weights(g)) - allow the user to pass in an optional distmx matrix of weights, which defaults to weights(g) which in turn (for SimpleGraphs) returns a DefaultDistance of 1 and for SimpleWeightedGraphs and MetaGraphs will call the weights method which does the right thing.

sbromberger · 2017-08-28T02:14:15Z

src/graphcut/normalized_cut.jl

+add_edge!(g, 3, 4, 0.2)
+
+labels = normalized_cut(g.weights, 0.1)
+```


Our docstring convention is as follows:

""" command(arg1, arg2[, optarg1=val1] Given an `arg1` and a value `arg2`, return a `bar`. If `optarg1` is specified, modify the return value by `baz`. ### Implementation Notes something here ### Performance Time complexity is ``O(n)``` ### References - ref1 - ref2

sbromberger · 2017-08-28T02:15:17Z

src/graphcut/normalized_cut.jl

+labels = normalized_cut(g.weights, 0.1)
+```
+"""
+function ncut{T<:Real}(W::SparseMatrixCSC{T,Int}, thres::Real = 0.01 ,num_cuts::Int = 10)


If you're using abstract arguments (like Real), there's no specific need to typecast it. thresh can be untyped.

sbromberger · 2017-08-28T02:16:16Z

src/graphcut/normalized_cut.jl

+function ncut{T<:Real}(W::SparseMatrixCSC{T,Int}, thres::Real = 0.01 ,num_cuts::Int = 10)
+
+    #returns normalized cut score
+    function ncut_cost(mask)


Embedded functions are generally not used in LightGraphs. Instead, pull this out of the top-level function and prefix it with a _ to indicate it should not be used.

sbromberger · 2017-08-28T02:16:44Z

src/graphcut/normalized_cut.jl

+            for j in nzrange(W, i)
+                row = rows[j]
+                if mask[i] != mask[row]
+                    cut_cost += vals[j]/2


You'll want to use a linter (like LanguageServer.jl's) to ensure your spacing is consistent.

sbromberger · 2017-08-28T02:17:11Z

src/graphcut/normalized_cut.jl

+
+    if m == 1
+        return [1]
+    end


We shortcut non-mutating code when it's a one-liner:

m == 1 && return [1]

sbromberger · 2017-08-28T02:20:37Z

src/graphcut/normalized_cut.jl

+        labels = Vector{Int}(m)
+        n1 = maximum(labels1)
+
+        for i in 1:(m - sum(mask))


Is it more efficient to calculate sum(mask) once and reuse that variable?

sbromberger · 2017-08-28T02:22:14Z

test/graphcut/normalized_cut.jl

+    add_edge!(g, 1, 2, 1)
+    add_edge!(g, 3, 4, 1)
+
+    labels = ncut(g.weights, 0.1)


See the other test scripts: basically we loop through graphs of different eltypes:

gx = SimpleWeightedGraph(4) add_edge!(gx, 1, 2, 1) add_edge!(gx, 3, 4, 1) for g in testgraphs(gx) labels = ncut(g.weights, 0.1) @test .... end

jpfairbanks · 2017-08-29T11:01:07Z

src/graphcut/normalized_cut.jl

+    end
+
+    #get eigenvector corresponding to second smallest eigenvalue
+    v = eigvecs(full(D-W), full(D))[:, 2]


This should be

lambda, vecs, nconv, niter, nconv, nmult, resid = eigs(D-W, nev=2, which=:SR) if nconv < 2 error("eigensolver did not converge") end v = vec[:,2]

For my second test case, this is giving an error - Base.LinAlg.ARPACKException("unspecified ARPACK error: -9999").
You can see the detailed error message by Pkg.test("LightGraphs").

jpfairbanks · 2017-08-29T11:08:04Z

src/graphcut/normalized_cut.jl

+    #perform n-cuts with different partitions of v and find best one
+    min_cost = Inf
+    best_thres = -1
+    for t in linspace(minimum(v), maximum(v), num_cuts)


This is even spacing between the min and max, should you choose the spacing as between the points of v?

As in w = sort(v); (w[i+1] + w[i])/2? That way you are guaranteed that all n cuts are distinct.

From the paper - Currently, the search is done by checking l evenly spaced possible splitting points, and computing the best Ncut among them.. It wasn't clear to me exactly how they searched for the best partition. https://github.com/scikit-image/scikit-image/blob/master/skimage/future/graph/graph_cut.py#L76 (line 210) Here they search evenly between min and max.

sbromberger · 2017-09-01T22:55:57Z

@jpfairbanks https://travis-ci.org/JuliaGraphs/LightGraphs.jl/builds/271017544#L541-L543

tejus-gupta · 2017-09-02T06:33:21Z

(continuing a discussion in one of the comments above, that thread was collapsed since the code is outdated)
eigs is giving an error in my second test case.

D_W = [1.0 -1.0 0.0 0.0; -1.0 1.0 0.0 0.0; 0.0 0.0 1.0 -1.0; 0.0 0.0 -1.0 1.0]
D = [1.0 0.0 0.0 0.0; 0.0 1.0 0.0 0.0; 0.0 0.0 1.0 0.0; 0.0 0.0 0.0 1.0]
eigs(D_W, D, nev=2, which=:SR)

When D-W and D have the above values, I am getting this error:
ERROR: Base.LinAlg.ARPACKException("unspecified ARPACK error: -9999").

jpfairbanks · 2017-09-12T18:34:27Z

I get the same error, I think it has to do with the fact that when given two matrix arguments ARPACK tries to solve the generalized eigenvalue problem, $D_Wx = lambda Dx$. This can be avoided by normalizing the matrix beforehand and using a standard eigenvalue problem. $D^-1 D_W x = lambda x$

timholy · 2017-09-13T00:39:07Z

If your inputs are symmetric, wouldn't it be better to do L = cholfact(D, :L) and solve

L^-1 D_W L^-1^T y = λy

for y = L^T x?

jpfairbanks · 2017-09-20T22:04:47Z

D is the diagonal matrix of weights so you don't need the cholfact, right?

timholy · 2017-09-20T22:25:12Z

I hadn't looked. Even easier then, just sqrt.(D). But still best to keep things symmetric, it will use a more robust algorithm.

sbromberger · 2017-11-14T21:55:39Z

Is this PR still active? What do we want/need to do with it?

tejus-gupta · 2017-11-15T08:44:34Z

I changed to code use eigs with standard eigenvalue problem instead of generalized eigenvalue problem. eigs(inv(D)*(D-W), nev=2, which=:SR)

But I was still facing some issues -

D = [1.0 0.0; 0.0 1.0]
D_W = [1.0 -1.0; -2.0 2.0]
eigs(inv(D)*D_W)

Error: ArgumentError: input matrix A is too small. Use eigfact instead.

When eigenvalues are repeated, I sometimes don't get all the eigenvalues.
e.g,

D_W = [1.0 -1.0 0.0 0.0; -1.0 1.0 0.0 0.0; 0.0 0.0 1.0 -1.0; 0.0 0.0 -1.0 1.0]
D = [1.0 0.0 0.0 0.0; 0.0 1.0 0.0 0.0; 0.0 0.0 1.0 0.0; 0.0 0.0 0.0 1.0]
eigs(inv(D)*D_W)

output = ([2.0, 2.0, -4.39379e-17], [-0.562149 0.428939 0.694331; 0.562149 -0.428939 0.694331; 0.428939 0.562149 0.133806; -0.428939 -0.562149 0.133806], 3, 1, 4, [0.0, 0.0, 0.0, 0.0])

Sorry for the late update. I couldn't understand the above discussion since I don't know how Cholesky factorization works.

timholy · 2017-11-15T13:53:51Z

I submitted a PR to your fork, @tejus-gupta, to work around the ARPACK bug. One thing you'll need to check is the warning: it sometimes reduces the number of eigenvectors to 1, which I interpret as meaning the problem is degenerate. In that case you're not getting the 2nd-smallest eigenvalue, you're getting the only eigenvalue. I didn't go back and check the algorithm to determine the implications of this situation.

Sorry I didn't explain the above as carefully as I should have:

D_w x = λ D x
D = C' C   # This is the definition of the Cholesky factorization, assumes D is positive-definite
Hence
D_w x = λ C' C x
define
y = C x  (so x = inv(C) y)
hence
D_w inv(C) y = λ C' y
hence
A y = λ y    # an ordinary eigenvalue problem
where
A = inv(C)' D_w inv(C)
Then solve for x with x = inv(C) y

In this case, though, since D is diagonal the Cholesky factorization is trivial (just C = sqrt.(D)).

Work around ARPACK bug in eigs for normalized_cut

codecov · 2017-11-15T20:02:06Z

Codecov Report

Merging #736 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #736    +/-   ##
======================================
  Coverage     100%   100%            
======================================
  Files          64     65     +1     
  Lines        3198   3305   +107     
======================================
+ Hits         3198   3305   +107

sbromberger · 2017-11-16T19:17:50Z

Hi @tejus-gupta @jpfairbanks @timholy - since this is well outside my area of expertise, I am relying on each of you to tell me when this is ready to merge. Thanks :)

tejus-gupta · 2017-11-16T20:33:21Z

@timholy thanks for explanation 😄.

The warning saying that the number of eigenvalues is 1 seems to be a problem with eigs.

eigs( [1.0 -1.0; -1.0 1.0], nev=2, which=:SR)[2]

This problem has eigenvalues 2 and 0. The corresponding eigenvectors are λ * [-1, 1] and λ * [1, 1]. (I verified with numpy)

jpfairbanks · 2017-11-17T19:11:53Z

@timholy's comment and the current code reflect my understanding of the symmetric normalized laplacian eigenproblem D^-1/2 (D-W) D^-1/2 y = lambda y with x = D^-1/2 y.

I quibble with the evenly spaced thresholds for finding the optimal cut given the eigenvector. Because you can check all n possible cuts with some fancy counting. But I think we should get this merged and then offer that as an alternative later. For the purpose of consistency with the literature, this method of evenly spaced splits is worth including.

tejus-gupta · 2017-12-04T18:15:55Z

Its working for me. normalized_cut(g, thres=0.02) separated the graph into the two obvious clusters. With the default thres=0.01, the normalized cut for this split wasn't above the threshold and so graph wasn't partitioned.

Should thres be a required argument instead of keyword argument? If thres is a keyword argument, it suggests that the function will work fine without the argument and the argument allows the user to customize the function call. thres=0.01 worked well enough to be default for images but it may not work well for all graphs.

jpfairbanks · 2017-12-04T20:36:47Z

This gets at a broader issue about algorithms that require choice of parameters. I would like to only offer a default value if we think that it will work for many input graphs. If there is basically no way to determine a good general purpose threshold, then we need to provide some guidance and say, you have to figure out a threshold on your own.

Do you have some test images that you would like to use? If we could find a default value for thres that works for the blkdiag graph, PathGraph, CompleteGraph, and some images. I would feel good about that threshold.

PathGraph should partition to 1:floor(n/2), ceil(n/2+1):n
CompleteGraph should not partition
Barbel graph blkdiag(CompleteGraph(n), CompleteGraph(m)) + 1 add_edge! should be 1:n,n:m+n
Images should give good patches.

These should be the test cases and then we have freedom to refine this algorithm as long as it gives the same results on these test cases.

sbromberger · 2017-12-22T23:45:57Z

What is the plan for this PR?

tejus-gupta · 2018-01-01T12:29:33Z

@jpfairbanks I don't think that we can have a threshold that would partition any PathGraph to 1:floor(n/2), ceil(n/2+1):n. For example, if I choose a threshold such that PathGraph with 2 vertices is split into two graphs with single vertex, then the algorithm can't stop after partitioning a PathGraph with 4 vertices into two graphs with 2 vertices each. It would also split the each of the graphs into graphs with single vertex.

The stopping criterion of the algorithm isn't the number of subgraphs, instead it would stop at a fixed subgraph size. Therefore I can set a threshold that would partition any PathGraph into subgraphs of fixed size.

I don't think there is any 'good' default value of the threshold. We should keep it as a required argument.

jpfairbanks · 2018-01-02T14:42:34Z

I'm not too worried about the n=2 case.

I know that spectral partitioning for general graphs is hard, and having a good default is important. I don't want to let the perfect be the enemy of the good so we could let the threshold be a required argument and have the documentation say that it is critical to choose this based on your application and suggest a binary search over thresholds to choose a good one based on your application.

@sbromberger what do you think about that?

sbromberger · 2018-01-02T16:26:43Z

@jpfairbanks that sounds like a good plan. My preference would be to document the heck out of this, and then merge it. :)

S.

jpfairbanks · 2018-01-02T16:38:43Z

Cool. This means that the W argument has to go after the threshold argument. I am updating the tests now.

jpfairbanks · 2018-01-02T17:54:19Z

I just pushed changes to a branch ncut which can get merged into this PR in order to preserve this PR, or we can merge that and close this PR.

sbromberger · 2018-01-02T18:05:51Z

Let's preserve this one if possible.

Also, @jpfairbanks - could you add your review? I'll take a look from a style perspective but it needs some technical eyes on it.

sbromberger

Style looks good. Thanks. @jpfairbanks will provide technical review of this one.

tejus-gupta · 2018-01-03T05:51:56Z

@jpfairbanks You probably wanted to use normalized_cut(g, t) instead of normalized_cut(g, 0.1) on line 87 of test file. We would have to modify the test on line 89 also since the number of unique labels wouldn't be same for different thresholds.

jpfairbanks · 2018-01-05T15:57:49Z

@tejus-gupta that is exactly right, I wondered why I kept getting the same answer but didn't notice that error.

Once that is fixed we should merge it.

jpfairbanks · 2018-01-05T15:57:55Z

@tejus-gupta that is exactly right, I wondered why I kept getting the same answer but didn't notice that error.

Once that is fixed we should merge it.

tejus-gupta · 2018-01-05T16:22:32Z

I have fixed the test.

jpfairbanks

This was a long saga, but great work @tejus-gupta, @timholy!

sbromberger · 2018-01-05T17:20:28Z

Many thanks to all of you for your persistence.

sbromberger · 2018-01-05T17:32:00Z

one last set of tests :)

timholy · 2018-01-05T19:55:48Z

Hooray! Thanks for persisting, @tejus-gupta!

added normalized_cut

f01a2e1

sbromberger reviewed Aug 28, 2017

View reviewed changes

jpfairbanks reviewed Aug 29, 2017

View reviewed changes

tejus-gupta and others added 3 commits August 31, 2017 19:17

requested changes

aba678c

Merge branch 'master' into ncut

438f01a

Update LightGraphs.jl

b9bcc79

sbromberger added 6 commits September 6, 2017 07:24

Merge branch 'master' into ncut

069a31b

Merge branch 'master' into ncut

b948b79

Merge branch 'master' into ncut

97b98f7

Merge branch 'master' into ncut

3d54116

Merge branch 'master' into ncut

197ed57

Merge branch 'master' into ncut

a4efc39

Merge branch 'master' into ncut

1dfd49c

Work around ARPACK bug in eigs for normalized_cut

3882a3e

Merge pull request #1 from timholy/teh/ncut

6d9c9fd

Work around ARPACK bug in eigs for normalized_cut

Merge branch 'master' into ncut

25089eb

sbromberger added 3 commits December 4, 2017 18:17

Merge branch 'master' into ncut

ea1122d

Merge branch 'master' into ncut

8c7a738

Merge branch 'master' into ncut

24e7006

Merge branch 'master' into ncut

de0e592

sbromberger reviewed Jan 2, 2018

View reviewed changes

sbromberger and others added 2 commits January 2, 2018 13:00

Merge branch 'master' into ncut

4ae8c56

make threshold a required argument for normalized_cut

a11db6f

tejus-gupta added 2 commits January 5, 2018 21:42

updated tests

3cc9bf8

update function definition in docstring

f932978

jpfairbanks approved these changes Jan 5, 2018

View reviewed changes

Merge branch 'master' into ncut

f74a4ac

sbromberger merged commit f307fc5 into sbromberger:master Jan 5, 2018

tejus-gupta deleted the ncut branch January 6, 2018 02:58

timholy mentioned this pull request Oct 5, 2018

Added normalized graph cut JuliaImages/ImageSegmentation.jl#8

Closed

WIP: Normalized graph cut #736

WIP: Normalized graph cut #736

Conversation

tejus-gupta commented Aug 27, 2017

sbromberger commented Aug 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbromberger commented Sep 1, 2017

tejus-gupta commented Sep 2, 2017

jpfairbanks commented Sep 12, 2017

timholy commented Sep 13, 2017

jpfairbanks commented Sep 20, 2017

timholy commented Sep 20, 2017

sbromberger commented Nov 14, 2017

tejus-gupta commented Nov 15, 2017

timholy commented Nov 15, 2017

codecov bot commented Nov 15, 2017 • edited Loading

Codecov Report

sbromberger commented Nov 16, 2017

tejus-gupta commented Nov 16, 2017

jpfairbanks commented Nov 17, 2017

tejus-gupta commented Dec 4, 2017

jpfairbanks commented Dec 4, 2017

sbromberger commented Dec 22, 2017

tejus-gupta commented Jan 1, 2018 • edited Loading

jpfairbanks commented Jan 2, 2018

sbromberger commented Jan 2, 2018

jpfairbanks commented Jan 2, 2018

jpfairbanks commented Jan 2, 2018

sbromberger commented Jan 2, 2018

sbromberger left a comment

Choose a reason for hiding this comment

tejus-gupta commented Jan 3, 2018

jpfairbanks commented Jan 5, 2018

jpfairbanks commented Jan 5, 2018

tejus-gupta commented Jan 5, 2018

jpfairbanks left a comment

Choose a reason for hiding this comment

sbromberger commented Jan 5, 2018

sbromberger commented Jan 5, 2018

timholy commented Jan 5, 2018

sbromberger commented Aug 28, 2017 •

edited

Loading

codecov bot commented Nov 15, 2017 •

edited

Loading

tejus-gupta commented Jan 1, 2018 •

edited

Loading