-
Notifications
You must be signed in to change notification settings - Fork 183
Conversation
Hi @tejus-gupta - thanks for this. If you're using Edited to add: In general, functions that can take weights can be abstracted to |
src/graphcut/normalized_cut.jl
Outdated
add_edge!(g, 3, 4, 0.2) | ||
|
||
labels = normalized_cut(g.weights, 0.1) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our docstring convention is as follows:
"""
command(arg1, arg2[, optarg1=val1]
Given an `arg1` and a value `arg2`, return a `bar`. If `optarg1` is specified, modify the return value by `baz`.
### Implementation Notes
something here
### Performance
Time complexity is ``O(n)```
### References
- ref1
- ref2
src/graphcut/normalized_cut.jl
Outdated
labels = normalized_cut(g.weights, 0.1) | ||
``` | ||
""" | ||
function ncut{T<:Real}(W::SparseMatrixCSC{T,Int}, thres::Real = 0.01 ,num_cuts::Int = 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're using abstract arguments (like Real), there's no specific need to typecast it. thresh
can be untyped.
src/graphcut/normalized_cut.jl
Outdated
function ncut{T<:Real}(W::SparseMatrixCSC{T,Int}, thres::Real = 0.01 ,num_cuts::Int = 10) | ||
|
||
#returns normalized cut score | ||
function ncut_cost(mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Embedded functions are generally not used in LightGraphs. Instead, pull this out of the top-level function and prefix it with a _
to indicate it should not be used.
src/graphcut/normalized_cut.jl
Outdated
for j in nzrange(W, i) | ||
row = rows[j] | ||
if mask[i] != mask[row] | ||
cut_cost += vals[j]/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll want to use a linter (like LanguageServer.jl's) to ensure your spacing is consistent.
src/graphcut/normalized_cut.jl
Outdated
|
||
if m == 1 | ||
return [1] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shortcut non-mutating code when it's a one-liner:
m == 1 && return [1]
src/graphcut/normalized_cut.jl
Outdated
labels = Vector{Int}(m) | ||
n1 = maximum(labels1) | ||
|
||
for i in 1:(m - sum(mask)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it more efficient to calculate sum(mask)
once and reuse that variable?
test/graphcut/normalized_cut.jl
Outdated
add_edge!(g, 1, 2, 1) | ||
add_edge!(g, 3, 4, 1) | ||
|
||
labels = ncut(g.weights, 0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the other test scripts: basically we loop through graphs of different eltypes:
gx = SimpleWeightedGraph(4)
add_edge!(gx, 1, 2, 1)
add_edge!(gx, 3, 4, 1)
for g in testgraphs(gx)
labels = ncut(g.weights, 0.1)
@test ....
end
src/graphcut/normalized_cut.jl
Outdated
end | ||
|
||
#get eigenvector corresponding to second smallest eigenvalue | ||
v = eigvecs(full(D-W), full(D))[:, 2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be
lambda, vecs, nconv, niter, nconv, nmult, resid = eigs(D-W, nev=2, which=:SR)
if nconv < 2
error("eigensolver did not converge")
end
v = vec[:,2]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my second test case, this is giving an error - Base.LinAlg.ARPACKException("unspecified ARPACK error: -9999")
.
You can see the detailed error message by Pkg.test("LightGraphs")
.
#perform n-cuts with different partitions of v and find best one | ||
min_cost = Inf | ||
best_thres = -1 | ||
for t in linspace(minimum(v), maximum(v), num_cuts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is even spacing between the min and max, should you choose the spacing as between the points of v?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in w = sort(v); (w[i+1] + w[i])/2
? That way you are guaranteed that all n cuts are distinct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the paper - Currently, the search is done by checking l evenly spaced possible splitting points, and computing the best Ncut among them.
. It wasn't clear to me exactly how they searched for the best partition. https://github.com/scikit-image/scikit-image/blob/master/skimage/future/graph/graph_cut.py#L76 (line 210) Here they search evenly between min and max.
(continuing a discussion in one of the comments above, that thread was collapsed since the code is outdated)
When |
I get the same error, I think it has to do with the fact that when given two matrix arguments ARPACK tries to solve the generalized eigenvalue problem, |
If your inputs are symmetric, wouldn't it be better to do
for |
D is the diagonal matrix of weights so you don't need the cholfact, right? |
I hadn't looked. Even easier then, just |
Is this PR still active? What do we want/need to do with it? |
I changed to code use But I was still facing some issues -
Error: ArgumentError: input matrix A is too small. Use eigfact instead. When eigenvalues are repeated, I sometimes don't get all the eigenvalues.
output = ([2.0, 2.0, -4.39379e-17], [-0.562149 0.428939 0.694331; 0.562149 -0.428939 0.694331; 0.428939 0.562149 0.133806; -0.428939 -0.562149 0.133806], 3, 1, 4, [0.0, 0.0, 0.0, 0.0]) Sorry for the late update. I couldn't understand the above discussion since I don't know how Cholesky factorization works. |
I submitted a PR to your fork, @tejus-gupta, to work around the ARPACK bug. One thing you'll need to check is the warning: it sometimes reduces the number of eigenvectors to 1, which I interpret as meaning the problem is degenerate. In that case you're not getting the 2nd-smallest eigenvalue, you're getting the only eigenvalue. I didn't go back and check the algorithm to determine the implications of this situation. Sorry I didn't explain the above as carefully as I should have:
In this case, though, since |
Work around ARPACK bug in eigs for normalized_cut
Codecov Report
@@ Coverage Diff @@
## master #736 +/- ##
======================================
Coverage 100% 100%
======================================
Files 64 65 +1
Lines 3198 3305 +107
======================================
+ Hits 3198 3305 +107 |
Hi @tejus-gupta @jpfairbanks @timholy - since this is well outside my area of expertise, I am relying on each of you to tell me when this is ready to merge. Thanks :) |
@timholy thanks for explanation 😄. The warning saying that the number of eigenvalues is 1 seems to be a problem with
This problem has eigenvalues 2 and 0. The corresponding eigenvectors are λ * [-1, 1] and λ * [1, 1]. (I verified with numpy) |
@timholy's comment and the current code reflect my understanding of the symmetric normalized laplacian eigenproblem I quibble with the evenly spaced thresholds for finding the optimal cut given the eigenvector. Because you can check all |
Its working for me. Should |
This gets at a broader issue about algorithms that require choice of parameters. I would like to only offer a default value if we think that it will work for many input graphs. If there is basically no way to determine a good general purpose threshold, then we need to provide some guidance and say, you have to figure out a threshold on your own. Do you have some test images that you would like to use? If we could find a default value for
These should be the test cases and then we have freedom to refine this algorithm as long as it gives the same results on these test cases. |
What is the plan for this PR? |
@jpfairbanks I don't think that we can have a threshold that would partition any PathGraph to 1:floor(n/2), ceil(n/2+1):n. For example, if I choose a threshold such that PathGraph with 2 vertices is split into two graphs with single vertex, then the algorithm can't stop after partitioning a PathGraph with 4 vertices into two graphs with 2 vertices each. It would also split the each of the graphs into graphs with single vertex. The stopping criterion of the algorithm isn't the number of subgraphs, instead it would stop at a fixed subgraph size. Therefore I can set a threshold that would partition any PathGraph into subgraphs of fixed size. I don't think there is any 'good' default value of the threshold. We should keep it as a required argument. |
I'm not too worried about the n=2 case. I know that spectral partitioning for general graphs is hard, and having a good default is important. I don't want to let the perfect be the enemy of the good so we could let the threshold be a required argument and have the documentation say that it is critical to choose this based on your application and suggest a binary search over thresholds to choose a good one based on your application. @sbromberger what do you think about that? |
@jpfairbanks that sounds like a good plan. My preference would be to document the heck out of this, and then merge it. :) S. |
Cool. This means that the W argument has to go after the threshold argument. I am updating the tests now. |
I just pushed changes to a branch |
Let's preserve this one if possible. Also, @jpfairbanks - could you add your review? I'll take a look from a style perspective but it needs some technical eyes on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style looks good. Thanks. @jpfairbanks will provide technical review of this one.
@jpfairbanks You probably wanted to use |
@tejus-gupta that is exactly right, I wondered why I kept getting the same answer but didn't notice that error. Once that is fixed we should merge it. |
1 similar comment
@tejus-gupta that is exactly right, I wondered why I kept getting the same answer but didn't notice that error. Once that is fixed we should merge it. |
I have fixed the test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a long saga, but great work @tejus-gupta, @timholy!
Many thanks to all of you for your persistence. |
one last set of tests :) |
Hooray! Thanks for persisting, @tejus-gupta! |
Implementation of recursive two-way normalized graph cut as described in this paper - "Normalized Cuts and Image Segmentation" by Shi and Malik. I had originally submitted a pull request at ImageSegmentation.jl but since this is more generally applicable, I was advised to submit a PR here. Since adjacency matrix for grid connectivity based graphs are sparse, some parts of the functions only work for sparse matrices. I will update the function to make it work for any array-like input.