Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: sparse truncated SVD #106

Closed
jiahao opened this issue Apr 23, 2014 · 20 comments
Closed

Feature request: sparse truncated SVD #106

jiahao opened this issue Apr 23, 2014 · 20 comments
Assignees
Labels
sparse Sparse arrays

Comments

@jiahao
Copy link
Member

jiahao commented Apr 23, 2014

There is a request for the rank-truncated singular value decomposition of large sparse matrices returning k largest singular values and the corresponding singular vectors.

@ViralBShah ViralBShah self-assigned this Apr 23, 2014
@ViralBShah
Copy link
Member

This should be easy with a wrapper around eigs. Will take a crack soon.

@jiahao
Copy link
Member Author

jiahao commented Apr 23, 2014

I should clarify that the version being requested is the rank-truncated singular value decomposition returning k largest singular values and the corresponding singular vectors. (issue description updated)

I recall some time ago that we had the eigs form some time ago computing the singular values as the eigenvalues of [0 A; A' 0], but it's numerically unstable.

@jiahao jiahao changed the title Feature request: sparse SVD Feature request: sparse truncated SVD Apr 23, 2014
@jiahao
Copy link
Member Author

jiahao commented Apr 23, 2014

I'm trying to see if there is a more modern algorithm, but doi:10.1137/0911029 is probably the reference to begin with. The first step is to compute a rank-revealing QR factorization which may or may not be provided by SuiteSparseQR. The second step is to compute the TSVD using Algorithm TSOL.

@ViralBShah
Copy link
Member

I'll take a look at what octave and others do for svds - but I suspect they compute eigenvalues of the larger matrix.

@swadey
Copy link

swadey commented May 24, 2014

+1 BTW, this would be great for our xdata project. I'm sorely missing some practical implementation of svds. In our example code, we did octave style postprocessing of eigs but this doesn't scale to the seriously large sparse matrices well.

@ViralBShah
Copy link
Member

What octave does is exactly what I am proposing we do for our svds. The memory utilization should only be 2x more. The other option is to not form the augmented system, but do matvecs with A and A'. I am not sure if that is any better numerically.

How large are your sparse matrices, and on what kind of machines do you run? One thing we can easily do is parallelize the matvecs, just for speed.

Cc: @alanedelman

@swadey
Copy link

swadey commented May 25, 2014

@jiahao knows about the problem details. Here's a short summary:

We're working on a large web-graph decomposition problem that us on the order of 42M dimensions with 623M non-zero entries in the adjacency matrix. Asking for a rank 3 reduction via eigs + post-processing doesn't finish within 24 hours. Memory doesn't seem to be the big issue: the machine has 40 cores + 512GB of memory. The code spends all of it's time in eigs.

@jiahao
Copy link
Member Author

jiahao commented May 25, 2014

@ViralBShah I did a little digging into the literature and it seems like Golub-Kahan-Lanczos is still pretty much state of the art for large and very sparse truncated SVDs. This survey (pdf) reviews its adaptation into several iterative algorithms in Section 4.3.

Not directly related but still interesting is the literature on incrementally updated truncated SVDs: if the rank of the desired factorization is known beforehand, you can define updates to an existing SVD as more columns or rows of the matrix are added. This paper seems to be one of the more comprehensive ones in terms of numerical methods, and has something nontrivial to say about missing data. I'm not sure if these methods are faster than the iterative ones.

@ViralBShah
Copy link
Member

Don't you have an implementation in IterativeSolvers.jl in that case? Should we bring it into Base?

@andreasnoack
Copy link
Member

Have you tried the proposed method in ARPACK, i.e. calculating the values and V from OP=A'(A*x)? It seems to work better than the MATLAB solution.

@jiahao
Copy link
Member Author

jiahao commented May 25, 2014

@ViralBShah I have only the naive method which computes all the singular values and no singular vectors. I'll see what I can do to get some of these other ones implemented.

@andreasnoackjensen That is essentially what all these methods do, but I have no idea what ARPACK does under the hood. I dread having to look.

@ViralBShah
Copy link
Member

We used to use the OP as you suggest earlier but the implementation was buggy. I will try it again now that the whole arpack calling is much more stable.

@andreasnoack
Copy link
Member

I thought that the earlier implementation used the MATLAB/Octave operator OP=[0 A;A' 0].

@ViralBShah
Copy link
Member

I don't recollect that. That was the approach I was considering.

@andreasnoack
Copy link
Member

I've found the old one, so just for the record it is here and you are right that it used A'*(A*x). I couldn't find any issues on the problems with svds, so do you remember?

@ViralBShah
Copy link
Member

I do not recollect what broke, but we can start with reinstating it and adding tests.

@jaak-s
Copy link

jaak-s commented Dec 20, 2014

Has there been any progress on svds?

If not then what about adding something like as follows as a place holder:

type SymX <: AbstractArray{Float64, 2}
    X
end

import Base.size
import Base.*
import Base.issym

*(s::SymX, v::Vector{Float64}) = s.X' * (s.X * v)
size(s::SymX)  = size(s.X, 2), size(s.X, 2)
issym(s::SymX) = true

function svds(X; args...)
  ex = eigs(SymX(X), I; args...)
  ## calculating left-side singular vectors
  left_sv = X * ex[2]
  return left_sv ./ sqrt(sum(left_sv.^2, 1)), sqrt(ex[1]), ex[2], ex[3], ex[4], ex[5], ex[6]
end

Example usage:

F1 = diagm( [3, 2, 1] )
a1 = svds(F1, nev = 2)

@ViralBShah
Copy link
Member

svds has been on my mind. I will try to make some progress over the holidays. I think the placeholder is a great start, and the final solution will not be too different from the placeholder - could you submit it as a pull request?

@jaak-s
Copy link

jaak-s commented Dec 20, 2014

Here you go. Pull request JuliaLang/julia#9425.

@andreasnoack
Copy link
Member

Fixed by JuliaLang/julia#9425

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests

5 participants