Feature request: sparse truncated SVD #106

jiahao · 2014-04-23T16:00:30Z

There is a request for the rank-truncated singular value decomposition of large sparse matrices returning k largest singular values and the corresponding singular vectors.

ViralBShah · 2014-04-23T17:19:38Z

This should be easy with a wrapper around eigs. Will take a crack soon.

jiahao · 2014-04-23T17:23:39Z

I should clarify that the version being requested is the rank-truncated singular value decomposition returning k largest singular values and the corresponding singular vectors. (issue description updated)

I recall some time ago that we had the eigs form some time ago computing the singular values as the eigenvalues of [0 A; A' 0], but it's numerically unstable.

jiahao · 2014-04-23T17:37:55Z

I'm trying to see if there is a more modern algorithm, but doi:10.1137/0911029 is probably the reference to begin with. The first step is to compute a rank-revealing QR factorization which may or may not be provided by SuiteSparseQR. The second step is to compute the TSVD using Algorithm TSOL.

ViralBShah · 2014-04-23T19:11:51Z

I'll take a look at what octave and others do for svds - but I suspect they compute eigenvalues of the larger matrix.

swadey · 2014-05-24T14:15:53Z

+1 BTW, this would be great for our xdata project. I'm sorely missing some practical implementation of svds. In our example code, we did octave style postprocessing of eigs but this doesn't scale to the seriously large sparse matrices well.

ViralBShah · 2014-05-25T07:30:11Z

What octave does is exactly what I am proposing we do for our svds. The memory utilization should only be 2x more. The other option is to not form the augmented system, but do matvecs with A and A'. I am not sure if that is any better numerically.

How large are your sparse matrices, and on what kind of machines do you run? One thing we can easily do is parallelize the matvecs, just for speed.

Cc: @alanedelman

swadey · 2014-05-25T15:44:45Z

@jiahao knows about the problem details. Here's a short summary:

We're working on a large web-graph decomposition problem that us on the order of 42M dimensions with 623M non-zero entries in the adjacency matrix. Asking for a rank 3 reduction via eigs + post-processing doesn't finish within 24 hours. Memory doesn't seem to be the big issue: the machine has 40 cores + 512GB of memory. The code spends all of it's time in eigs.

jiahao · 2014-05-25T20:59:03Z

@ViralBShah I did a little digging into the literature and it seems like Golub-Kahan-Lanczos is still pretty much state of the art for large and very sparse truncated SVDs. This survey (pdf) reviews its adaptation into several iterative algorithms in Section 4.3.

Not directly related but still interesting is the literature on incrementally updated truncated SVDs: if the rank of the desired factorization is known beforehand, you can define updates to an existing SVD as more columns or rows of the matrix are added. This paper seems to be one of the more comprehensive ones in terms of numerical methods, and has something nontrivial to say about missing data. I'm not sure if these methods are faster than the iterative ones.

ViralBShah · 2014-05-25T21:25:21Z

Don't you have an implementation in IterativeSolvers.jl in that case? Should we bring it into Base?

andreasnoack · 2014-05-25T21:31:40Z

Have you tried the proposed method in ARPACK, i.e. calculating the values and V from OP=A'(A*x)? It seems to work better than the MATLAB solution.

jiahao · 2014-05-25T21:46:16Z

@ViralBShah I have only the naive method which computes all the singular values and no singular vectors. I'll see what I can do to get some of these other ones implemented.

@andreasnoackjensen That is essentially what all these methods do, but I have no idea what ARPACK does under the hood. I dread having to look.

ViralBShah · 2014-05-26T03:25:50Z

We used to use the OP as you suggest earlier but the implementation was buggy. I will try it again now that the whole arpack calling is much more stable.

andreasnoack · 2014-05-26T06:50:56Z

I thought that the earlier implementation used the MATLAB/Octave operator OP=[0 A;A' 0].

ViralBShah · 2014-05-26T09:01:18Z

I don't recollect that. That was the approach I was considering.

andreasnoack · 2014-05-26T11:38:42Z

I've found the old one, so just for the record it is here and you are right that it used A'*(A*x). I couldn't find any issues on the problems with svds, so do you remember?

ViralBShah · 2014-05-26T13:11:25Z

I do not recollect what broke, but we can start with reinstating it and adding tests.

jaak-s · 2014-12-20T13:02:40Z

Has there been any progress on svds?

If not then what about adding something like as follows as a place holder:

type SymX <: AbstractArray{Float64, 2}
    X
end

import Base.size
import Base.*
import Base.issym

*(s::SymX, v::Vector{Float64}) = s.X' * (s.X * v)
size(s::SymX)  = size(s.X, 2), size(s.X, 2)
issym(s::SymX) = true

function svds(X; args...)
  ex = eigs(SymX(X), I; args...)
  ## calculating left-side singular vectors
  left_sv = X * ex[2]
  return left_sv ./ sqrt(sum(left_sv.^2, 1)), sqrt(ex[1]), ex[2], ex[3], ex[4], ex[5], ex[6]
end

Example usage:

F1 = diagm( [3, 2, 1] )
a1 = svds(F1, nev = 2)

ViralBShah · 2014-12-20T13:19:54Z

svds has been on my mind. I will try to make some progress over the holidays. I think the placeholder is a great start, and the final solution will not be too different from the placeholder - could you submit it as a pull request?

jaak-s · 2014-12-20T17:22:58Z

Here you go. Pull request JuliaLang/julia#9425.

andreasnoack · 2015-01-16T15:27:15Z

Fixed by JuliaLang/julia#9425

ViralBShah self-assigned this Apr 23, 2014

jiahao changed the title ~~Feature request: sparse SVD~~ Feature request: sparse truncated SVD Apr 23, 2014

This was referenced Apr 24, 2014

Complete eigs implementation JuliaLang/julia#2956

Closed

What happened to svds? #95

Closed

ViralBShah mentioned this issue May 18, 2014

svds implementation #28

Closed

goggle mentioned this issue Jun 7, 2022

Operator norm (p = 2) for sparse matrices is not implemented JuliaSparse/SparseArrays.jl#119

Open

KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: sparse truncated SVD #106

Feature request: sparse truncated SVD #106

jiahao commented Apr 23, 2014

ViralBShah commented Apr 23, 2014

jiahao commented Apr 23, 2014

jiahao commented Apr 23, 2014

ViralBShah commented Apr 23, 2014

swadey commented May 24, 2014

ViralBShah commented May 25, 2014

swadey commented May 25, 2014

jiahao commented May 25, 2014

ViralBShah commented May 25, 2014

andreasnoack commented May 25, 2014

jiahao commented May 25, 2014

ViralBShah commented May 26, 2014

andreasnoack commented May 26, 2014

ViralBShah commented May 26, 2014

andreasnoack commented May 26, 2014

ViralBShah commented May 26, 2014

jaak-s commented Dec 20, 2014

ViralBShah commented Dec 20, 2014

jaak-s commented Dec 20, 2014

andreasnoack commented Jan 16, 2015

Feature request: sparse truncated SVD #106

Feature request: sparse truncated SVD #106

Comments

jiahao commented Apr 23, 2014

ViralBShah commented Apr 23, 2014

jiahao commented Apr 23, 2014

jiahao commented Apr 23, 2014

ViralBShah commented Apr 23, 2014

swadey commented May 24, 2014

ViralBShah commented May 25, 2014

swadey commented May 25, 2014

jiahao commented May 25, 2014

ViralBShah commented May 25, 2014

andreasnoack commented May 25, 2014

jiahao commented May 25, 2014

ViralBShah commented May 26, 2014

andreasnoack commented May 26, 2014

ViralBShah commented May 26, 2014

andreasnoack commented May 26, 2014

ViralBShah commented May 26, 2014

jaak-s commented Dec 20, 2014

ViralBShah commented Dec 20, 2014

jaak-s commented Dec 20, 2014

andreasnoack commented Jan 16, 2015