Skip to content

[Consistency] Default behaviour of SVD vs QR #214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lezcano opened this issue Jul 1, 2021 · 13 comments
Closed

[Consistency] Default behaviour of SVD vs QR #214

lezcano opened this issue Jul 1, 2021 · 13 comments
Labels
API change Changes to existing functions or objects in the API. topic: Linear Algebra Linear algebra.

Comments

@lezcano
Copy link
Contributor

lezcano commented Jul 1, 2021

Problem
The current default behaviour of linalg.svd computes the full decomposition by setting full_matrices=True by default. On the other hand, the default behaviour of linalg.qr specifies mode="reduced" by default.

Proposal
Change de default to full_matrices=False in linalg.svd. In most cases the reduced SVD is good enough. Even more, it’s better to do less work by default and just make the more expensive operation opt-in.

cc @rgommers @mruberry

@rgommers rgommers added the topic: Linear Algebra Linear algebra. label Jul 12, 2021
@rgommers
Copy link
Member

Thanks for the proposal @lezcano. It sounds sensible, it would be good to get some confirmation that this is the right thing to do. One way to go about that would be to check some of the svd usages in SciPy. Over 50% of those do not use full_matrices at all, which is likely due to the authors just going with the default. If you pick, say, 3 or 5 examples from SciPy and show that they get more efficient if adding full_matrices=False without introducing algorithmic/accuracy issues, then that'd be clear evidence that changing to the more efficient default makes sense.

@lezcano
Copy link
Contributor Author

lezcano commented Jul 12, 2021

That makes sense. I'll show a few examples here going in the order as they appear when you look for linalg.svd in the SciPy repo together with one from the svd tests. I will omit those where the compute_uv=False is used, as for those the flag full_matrices does nothing.

  1. test_svds.py L19. Called from a number of places in the file.
    It performs an SVD and then orders the singular values (and the singular vectors) increasingly or decreasingly. In every call within the file, the code makes sure that the k passed is less than k = min(m, n) for a matrix of size m x n. In other words, full_matrices=False should have been used here.

  2. Trust region methods. The SVD is used with full_matrices=False.

  3. Test multivariate normal. The SVD is used without full_matrices to generate a random orthogonal matrix. This is not a good use of svd, it should be replaced by a qr. The reason is that it is possible to prove that it a matrix has entries distributed as a standard normal, the Q of its qr decomposition is distributed as the Haar measure on the orthogonal group. Also, qr is much faster than svd. All this is implemented carefully and efficiently in stats.ortho_group.

  4. K-means clustering. The SVD is used with full_matrices=False.

  5. test_basic.py. It is used without parameters but on a square matrix. As such, the behaviour with full_matrices=False would be the same.

  6. rotation.py It is used without parameters but on a square matrix. As such, the behaviour with full_matrices=False would be the same.

Summary: These first examples show that in all the cases using full_matrices=False is either beneficial or a noop or shows some problems with the code. The most problematic point is the third one, where one might find examples in which the user did intend to use full_matrices=True (for correct reasons or not) and changing the default would break code. That being said, it is clear that the full_matrices=False default is a saner one from an API perspective.

@rgommers
Copy link
Member

Thanks @lezcano! That's convincing to me, +1 for changing the default. And I guess it's time to open a SciPy PR:)

@ilayn
Copy link

ilayn commented Jul 14, 2021

This is not accurate as in the mistake is that the 'reduced' part is about numpy.linalg.qr.

In scipy.linalg.qr and scipy.linalg.svd all return the full output by default and there is no inconsistency. This is in line with dense usage since most of the use cases require full output rather than the economy output.

@shoyer
Copy link
Contributor

shoyer commented Jul 14, 2021

I agree, when I've used svd I cannot think of a case where I did not set full_matrices=False.

@kgryte
Copy link
Contributor

kgryte commented Jul 15, 2021

@ilayn Can you elaborate a bit more on "most of the use cases require full output"? Which use cases do you have in mind?

@ilayn
Copy link

ilayn commented Jul 15, 2021

Like I mentioned almost all. From signal processing to control theory, identification, more esoteric matrix decompositions we always use full matrices. Other than low rank approximations and least squares problems I don't encounter economy mode of svds since often thin svd is comparable to QR to change the basis. Compact SVD is also a low-rank approximation but you can also use iterative methods with better performance for sizes of matrices that would make any difference.

For reduced QR that's a mixed bag since depending on the matrix shape you can get both as useful as possible however that doesn't grant a ticket for changing the default to thin qr. Julia stopped the economy mode in 0.7ish for defaults I think, matlab also doesn't do thin or reduced by default for SVD or QR.

So is the case for Mathematica for SVD but returns thin for QR but I haven't checked since years. Thus the only two I know of that doesn't fit the rest is numpy qr and mathematica qr which are strange. I wasn't around then so I don't know why it is selected.

@lezcano
Copy link
Contributor Author

lezcano commented Jul 16, 2021

I agree with @shoyer. I have never had the need to use full_matrices=True.

In the codebase of SciPy, I just found one use of full_matrices=True that was needed, while there were plenty of uses of full_matrices=False.

In the field that I know best, that of matrix analysis and geometry, thin SVD and reduced QR are often used to simplify certain computations to smaller matrices. For example, when you compute the geodesics on the manifold of matrices with orthonormal columns (the Stiefel manifold) St(n, p) with n > p, you can do so via first doing a thin QR decomposition and simplifying the computation of a matrix exponential from a matrix of size 2n x 2n to one of size 2p x 2p (see eq. 2.45-2.47 in the paper). Same for the Grassmannian via a compact SVD (see Theorem 2.3). In the same article, it is shown how to implement a form of conjugate gradient descent, also via the thin SVD.

In a similar way, a reduced SVD may be used to compute the polar decomposition of a matrix. More generally, it allows for computing the projection of a matrix onto the Stiefel manifold or low-rank approximations (as @ilayn mentioned) via Eckart–Young–Mirsky. While iterative methods based on Lanczos can be more efficient at times, this might not be always the case. For example, on GPUs, iterative methods are incredibly slow, but rather fast GPU implementations of the SVD exist. Given that a number of the frameworks that will hopefully follow the array API support GPU, this is certainly something to keep in mind.

In short, I have encountered the thin SVD in many different applications as a building block for other algorithms, and in all the applications that I have come across, the thin SVD was used. That being said, my experience is certainly biased towards the fields I have studied. As such, it would be helpful if other people could share examples on which the full SVD is useful, as I do not know of any in my field, and there was just one in the SciPy code base.

@leofang
Copy link
Contributor

leofang commented Jul 26, 2021

I agree, when I've used svd I cannot think of a case where I did not set full_matrices=False.

I agree with @shoyer. I have never had the need to use full_matrices=False.

Pardon my poor English, but aren't these two statements contradictory? 😂

@lezcano
Copy link
Contributor Author

lezcano commented Jul 26, 2021

Edited :)

@kgryte
Copy link
Contributor

kgryte commented Aug 23, 2021

Based on consortium meeting minutes from 2021-07-29, it was decided to keep the current default of full_matrices for SVD given the lack of consensus regarding whether full output is more generally preferred. Given that this retains the status quo, users will have to opt-in to return a reduced SVD. This is similar to other environments (e.g., MATLAB).

@nikitaved
Copy link

nikitaved commented Jan 26, 2022

My 2 cents on this. If your input is super wide/tall, then having full_matrices=True is inefficient from the memory point of view. If, further, a backprop is required, only min(m, n) size subspaces are used for grad computations. But since these subspaces are views on those potentially huge subspaces, these bases have to be stored for both the backward and forward.
I personally find the reduced variant cleaner because we get orthogonal bases for the row and column subspaces without the need of narrowing it down if, for example, we would like to get a projection onto the range/complement subspaces.

@jakirkham
Copy link
Member

Yeah we don't even have full_matrices=True as an option in Dask currently. It is implemented as full_matrices=False. Though we do have an open issue about adding this ( dask/dask#3576 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API change Changes to existing functions or objects in the API. topic: Linear Algebra Linear algebra.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants