-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort singular values #46
Conversation
Interestingly, the singular values just happen to be ordered already for the tall and short and matrices, while they are not ordered correctly for the square example. Before we implement the sorting of the matrices, we at least want these tests to fail, so we must construct new matrix examples for which the singular values hopefully come out unsorted.
Before implementing the sort functionality, it is useful to have some tests that fail because the singular values are not sorted. However, it appears that the computation of the SVD for these examples does not even (currently) finish!
Thanks for this! I was worried this might be the case somewhere... The algorithm should theoretically terminate but I'll have to dig around to figure out why it isn't - fortunately you've given me examples to test! I suspect it is either numerical inaccuracy causing the break condition to not be met or a minor bug somewhere. I'll try to figure out which and get a fix in - it might be as simple as a safety check to stop us looping infinitely and return an |
Hey @Andlon. Good news! I'm pretty confident I found the cause of the infinite looping and it is an easy bug to fix. I'm happy for you to roll it into this PR. The offending line is number 350: // This is incorrect
diag_abs_sum = T::min_positive_value() *
(b_ii.abs() + *b.get_unchecked([i + 1, i + 1]));
// It should be:
diag_abs_sum = T::min_positive_value() *
(b_ii.abs() + b.get_unchecked([i + 1, i + 1]).abs()); This is detailed in algorithm 1b in the source for this implementation. Let me know how you want to handle this. |
As a side note - we can also speed up convergence if we use a more sensible value for epsilon. Right now I'm using |
That was quick. Great job! Thanks for looking into it. I guess the easiest thing to do is for me to make the change and just accredit you in the commit message? Or if you want this fix in as soon as possible, you can include it separately and I can rebase. I might have some time tonight to continue work on this. If not, I should have some time during the week. Also, your PR to |
Also, a bit tangential, but I didn't want to open a new issue just yet. Currently, when writing tests, it's rather painful to compare approximate equality for matrices and vectors. I used the same style as I've seen in the existing code, which is to use iterators to check that the absolute difference for each element is less than some fairly arbitrary threshold. I think we should have a standardized way of doing this, probably based on ULPS such that we can do something to the effect of
The What's your thoughts on this? In any case, I don't need it to complete the current issue, but it would certainly make testing more straightforward. |
I'm happy to leave the bug fix to you for now - if it turns out that you don't have as much time as you thought I'll get it in separately. I agree that approximate equality would be a useful thing! I'll try to remember to write up an issue for that tomorrow. |
Thanks to @AtheMathmo for figuring out the issue. See AtheMathmo#46 for a discussion of the problem.
So I had a tiny bit of time to look more into it. For the You can see this for yourself if you incorporate the one-line fix you sent me and disable the tests against |
Ah - interesting... I'll have to dig into this a little. It looks like it may be worth adding a new issue to track this bug if it's non-trivial. |
So after a little more reading it seems that the non-negative singular values is a conventional requirement. The validate svd passes because this is a sort-of valid SVD. I'm not sure if the sign flip is a bug in my implementation or whether we should simply enforce this at the same time as ordering the values. i.e. have a new function to enforce convention, I'll try to figure out if the algorithm should capture this - but if not this doesn't add too hard a performance hit and produces a conventionally valid svd. |
I skimmed the article you had posted earlier, and to me it wasn't entirely clear. However, they defined the SVD using non-negative diagonal entries, so I would expect them to explicitly notify the reader that the singular values produced by the algorithm could be negative, if this were the case... I agree that we can fix this by post-processing, however if the algorithm is actually intended to produce non-negative singular values, the behavior could also hide other, more subtle bugs. Is it possible to break the algorithm up more into individually testable pieces? I suspect this may be difficult. I could however write some tests for the Golub-Kahan step, and verify the properties of the output as described in the aforementioned paper. |
It does seem surprising to me - but they also define the singular values being ordered and I'm fairly confident this isn't enforced by the algorithm. Nonetheless I think it warrants a little extra reading.
Yes we can break it up a little more. Many of the algorithms (including the one I'm using) first reduce the algorithm to a bidiagonal decomposition (which is already separate). Then we block the matrix to get the part we are still working on. We zero off the off-diagonal of zero-valued singular values. And finally apply some inner step (like Golub-Kahan) to reduce the rest of the matrix. I think that we can break off each of these to test them individually. But to keep code tidy this might be better done in a new module (at least until we make a decision about custom decomposition types). |
Ah, you make a good point. In that case - at least until we determine otherwise - I think we should assume that the sign from Golub-Reinsch is not correct, and I'll proceed with an implementation that post-processes the result of the Golub-Reinsch algorithm. I think I'll structure it like the following:
In the long run, we need much more extensive tests. I would be interested to look into using some form of property-based testing, like QuickCheck. Thoughts? |
So I have managed to find one more bug! One of the steps I described:
Is currently very wrong! Instead of applying a transformation to do this I am essentially setting the value to zero. I would be surprised if this makes a significant enough difference to cause a sign change - but nonetheless I'll try to sort it. This probably motivates a new issue for the two SVD bugs? Feel free to continue as you are planning to for now. I (optimistically) suspect that the new bug is only affecting convergence rates and accuracy. Edit: I couldn't crack the bug on my own (yet). So I posted a stack exchange question - feel free to check it out. Edit2: Sorry I missed this part. I like the idea of using |
Sounds good. I'm not well-versed enough in the details of these algorithms to be able to offer much assistance at the moment, and in many ways diving into them is too similar to my thesis work, in the sense that I'd like the things I do for |
That's totally fair! I don't like leaving contributors to clean up messes I made in my code so I'll do my best to get it fixed up :). |
I think leaving bugs in the implementation of a very technical and difficult algorithm does not count as "making a mess", but maybe that's just me :P Anyway, I'll see if maybe I have a little time tonight, although I was thinking I'd finish the macro PR first, since it's so simple. |
I think you're probably right! Fortunately I figured out the answer to the stackexchange question - I should be able to test out the fix later on today.
Great! The macro will be really good to have in there finally. |
All right, I think I'm back on the track with this issue now. I've implemented the signs correction, so now I can turn my mind back to the sorting of the singular values again. I just wanted to make a mental note that we really need to document the dimensions of the returned SVD! I was very surprised when I discovered that the U and V matrices were not both square. Of course, there are many equally adequate ways of representing the SVDs - we just need to document it! |
Right you are! I had a bit of a headache figuring this out myself so it's certainly bad that I didn't document it well at the time... |
I managed to do a little digging into the negative values and auto-sorting. I don't have any super compelling evidence - but you can read about what I found in #48 . |
I need to look a little more but I think that the code looks solid to me. I think at this point it is worth us splitting out the svd code into a new module as well. This means modifying the |
I still need to set up the benchmarks and fix up the documentation a little. Hoping to have some time to work on that today. Yeah, definitely. The
I think, however, that we should complete this PR and treat the reorganization as a separate issue. This seems like a prerequisite to further work on #40, perhaps...? |
All sounds good to me. Take your time with the docs etc. |
All right, so here's an update. I quickly hacked together a few benchmarks, and at the moment the SVD implementation converges so slowly that it doesn't make sense to talk about sorting vs not sorting. For example, within 5-10 minutes it hadn't finished the SVD of a 100x100 matrix, which took Numpy about 0.05 seconds. I think I'll leave out the benchmarks for now, let sorted SVDs be the default, fix up the docs, then let you review for a merge. Then we'll leave the slow convergence to #48 instead. I think after we've got a reasonably fast SVD, we can revisit the topic of whether or not the ordering should be opt-out. What do you think? |
Ha! 5-10 minutes is pretty terrible, looks like we do indeed need some more work. I'm hoping this is a result of our incredibly harsh stopping criterion ( Your approach sounds good though. Let's get this in even if the performance remains terrible and we'll address other issues in #48. Thanks |
Ah, of course. I forgot about that! I made a quick hack with your suggested changes to change the stopping criterion. Preliminary bench results look like this:
I'll work on it a bit more and see what happens. Not sure what to make of it just yet. Edit: |
Well those numbers look a lot more reasonable. In fact - they seem an order of magnitude faster than numpy? I'm pretty sure numpy only has svd support via lapack so that would be surprising (unless python adds that much overhead?). Either way this is good news (providing the accuracy isn't wrecked by this change). |
I didn't use the same matrix for numpy and the |
I hadn't even thought thoroughly about it, but of course |
I suppose I could create a temporary trait that exposes |
I'm not sure how best to handle it for different floats. I guess a simple approach is to use |
We managed to get the epsilon PR into num - hopefully that means we'll be able to utilize it soon! |
The idea is to remove this once epsilon() lands in the next version of the num crate.
Since the benchmarks test module-internal methods, the benchmarks can not reside in the benches/ subdirectory. However, since the benchmarks must be run on nightly, it involves some complication. Hence, in order to run these benchmarks, you must add the 'nightly' feature to the compilation, like so: `cargo bench --features nightly`
I'm currently using 3 times machine epsilon, and I got these numbers:
It seems to me that the ordering of the singular values has little impact on performance, although to be fair this is only for a single matrix. In my opinion, we should hide In order to incorporate these benches, I needed access to non-public methods of the API, so I had to embed the benchmarks in the I consider the feature complete now, pending your review. |
Oh, and, yeah, it's great that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the implementation looks good - and the new tests are a lot more comprehensive, thank you!
I would like to move these benches out (and remove the ones on private functions). Adding them like this will require CI changes and is one more (albeit minor) thing to be aware of when developing. If you disagree I am happy to hear you out!
let ref mut shortest_matrix = if u.rows() <= v.rows() { &mut u } | ||
else { &mut v }; | ||
let column_length = shortest_matrix.rows(); | ||
let num_singular_values = cmp::min(b.rows(), b.cols()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe b
should be square by this point. But we can keep this in for future-proofing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I wrote this function when I thought b
was rectangular. However, the implementation as is doesn't make any assumptions on the form of the SVD decomposition, and I think (although this isn't tested) it would work also for other forms of the SVD, so we might as well keep it as is? (i.e. if we swap the Golub-Reinsch algorithm with one that outputs a different kind of SVD)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it makes sense to keep it in whether we need it or not.
|
||
// When correcting the signs of the singular vectors, we can choose | ||
// to correct EITHER u or v. We make the choice depending on which matrix has the | ||
// least number of rows. Later we will need to multiply all elements in columns by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm just tired - but don't we want to multiply the rows of v
by -1
instead of the columns? b * v
is basically scaling the rows of v
by the respective singular values?
If this is true then it becomes a little trickier to determine the faster approach. It will probably almost always be faster to flip the sign on the row due to cache access and vectorization.
Actually I just realised that we transpose v
at the end so this is correct. I'm leaving this comment to highlight the fact that we may want to modify the algorithm so that we are operating on rows (not in this PR of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also possible to collect the indices of all columns that need to be flipped, and then iterate over each row, flipping the sign of each element as we go. It's also possible to do the same for the sorting, and would likely be faster. However, the implementation would be slightly more complex, and since the overhead seems to be more or less negligible at this point, I opted for the simpler approach of just dealing with columns directly.
As you say, we may want to revisit this in the future!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I just realized that a 100 x 100
matrix (which I used in my benchmarks) would still fit in L2 cache. Guess we should do some tests for matrices so big that they also don't fit in L3? Perhaps the results would be significantly different then...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try that too. In particular the case nxm where n >> m, E.g. 10000x10. This a bit more representative of cases we may see in rusty-machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do that next chance I get. Thinking a little about it though, I suspect that we have the following performance characteristics. First, consider that if m >= n
, SVD is usually something like O(m n^2)
flops or so, and I'm not sure about memory accesses of Golub-Reinsch. The ordering of the singular vectors is roughly O(n^2)
memory accesses in the worst case. Based on this and our current benchmarks, I would expect the following:
- For small matrices, the matrix fits in cache and the column swaps are therefore still fast enough so that the cost is negligible.
- For larger matrices for which the matrix does not fit in cache, the column swaps are presumably more expensive (though I expect due to pre-fetching it's perhaps not too bad, since the stride is uniform?), but the cost of the ordering is completely dwarfed by the complexity of the SVD computation.
Looking forward to try the benchmarks out!
By the way, you may find the following interesting. I spent Fall 2014 and Spring 2015 at UC Berkeley. Among other things, I took some courses under Prof. James Demmel. Prof. Demmel is a leading figure in the research of what they call "communication avoiding algorithms". In essence, the goals of these algorithms is to minimize not just the number of flops required for an algorithm, but it also uses a simple mathematical model for cache behavior, so that you can develop algorithms that provably minimize (again in a big O sense) the amount of communication, i.e. the number of expensive memory accesses. I mention this, because I remember that at some point prof. Demmel referred me to some certain papers on these algorithms. Their publications can be found here. I think in particular this survey paper may be of interest. It's fairly up to date (published in 2014) and discusses the communication optimality of various algorithms in linear algebra.
However, I want again to stress that we should ensure correctness of the algorithms in rulinalg
(by extensive testing against expected error bounds etc.) before this should become a topic. Still, thought I'd mention it, because I think there's a lot of interesting stuff in there :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must have missed this comment before. Thank you for pointing out those papers! They look like really interesting reads - though at a first glance pretty challenging. The survey paper looks like a really good start, it looks like I could learn a lot from it.
// Sorting a vector of indices simultaneously with the singular values | ||
// gives us a mapping between old and new (final) column indices. | ||
indexed_sorted_values.sort_by(|&(_, ref x), &(_, ref y)| | ||
x.partial_cmp(y).expect("All singular values should be finite, and thus sortable.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this panic be user error or an algorithmic failure? (Or both I guess...)
I'm wondering whether it is better to propagate this by returning a Result
. It feels like a "something has gone terribly wrong" though so the panic is probably appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning for a panic was this:
We are using the successful result of svd_unordered
(which in turns uses the result of svd_golub_reinsch
), which means that if the singular values are not all finite, it must be a bug in one of these functions, in which case I think a panic is appropriate. Does this make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that does make sense, just wanted to check!
@@ -1297,4 +1485,55 @@ mod tests { | |||
|
|||
let _ = a.lup_decomp(); | |||
} | |||
|
|||
#[cfg(all(test, feature = "nightly"))] | |||
mod benches { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to move these benches into the main benches folder and just avoid benching the svd_unordered
. If we ever want to add this function to the public api we can add the benches then.
Of course it is nice to have it benchmarked but in this case I'd prefer to leave it out due to other feature changes. These mean I have to modify the CI for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it isn't worth the added complexity. I added it partly because I thought it would be useful to have a bench harness also for private functions. But I suppose this use case is far and between at the moment. In any case, we should probably be more concerned with correctness than maximum performance going forward anyway. Perhaps by the time we need more fine-granied benchmarks again, there will be a benchmarking harness in stable.
I'll remove all the module internal benchmarks as you say!
OK, so I tried running
Seems to be the performance scaling is rather less than optimal. I guess this might be due to the incorrect handling of the Givens rotations you were talking about? |
Instead just bench the public .svd() method along with the other benchmarks.
@AtheMathmo: OK, I removed and moved out the benches as you requested. Think that should settle it, unless you see anything else that needs to be done! |
Thanks for getting those timings. I agree that the scaling is a little worrying - but not totally unexpected. Did you try running 10000x10 out of the benching harness? I'll look this over again quickly later today and get it merged. Thank you for all your hard work with it - I'm sorry also that the issue turned into much more than it began as! |
Yeah, but it didn't finish in reasonable time (I didn't have the patience to let it run longer than 10 mins or so, sorry!).
No need to be sorry. That is the nature of development :) |
By the way, note that for the tests where the dimensions are |
let num_singular_values = cmp::min(b.rows(), b.cols()); | ||
|
||
for i in 0 .. num_singular_values { | ||
if b[[i, i]] < T::zero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very minor but we can remove some bound checks here by using the get_unchecked
functions. This will have a very negligible effect though so I'm happy to merge without.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but I came to the conclusion that the effect here would presumably be completely negligible, in which case I think going unsafe
is not worth it. I think it makes more sense to only use unsafe
if you know that you have a potential bottleneck, preferably after profiling. Otherwise this would just fall into the category of premature optimization...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds sensible - let's keep it as the safe bound checked version.
/// non-increasing order. U and V have orthonormal columns, and each column represents the | ||
/// left and right singular vectors for the corresponding singular value in Σ, respectively. | ||
/// | ||
/// Denoting the dimensions of self as M x N (rows x cols), the dimensions of the returned matrices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that there is a minor mistake here in the dimensions listed below. I think that Σ
should be square with rows equal to the minimum of M
and N
. If the minimum is M
then I think that U
is M x M
and V
is N x M
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! Yeah, you're completely right. We transpose the matrix before applying Golub-Reinsch
. I only read the description in the paper you mentioned, and there it is assumed that m >= n
, which I had forgotten. Will fix immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
You may want to verify that I haven't got the dimensions I gave wrong. I only skimmed the source without checking the output (I'll have some time to check them properly later today if you can't).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I checked it by hand, as well as in the code. I think you are correct (easy to confuse). In the end I chose the verbose way of documenting it, as I found that to be clearer. Let me know if you want it differently phrased or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the verbose way is much better - it is pretty complex so taking more space to explain is sensible.
All looks good now so I'll merge. Thanks again for your work on this!
I just have two minor comments - only one of which needs a response (the one about the doc dimensions). If I'm mistaken on that comment I'm happy to merge now. |
* correct eigenmethods for 2-by-2 (and 1-by-1) matrices In issue AtheMathmo#46, it was observed that the `eigenvalue` and `eigendecomp` methods could give incorrect results. Rusty-machine creator and BDFL James Lucas noted that the problem was specific to 2-by-2 matrices and suggested special-casing them. * private methods for general and small matrix eigencomputations, test Conflicts: rusty-machine/src/linalg/matrix/decomposition.rs rusty-machine/src/linalg/matrix/mod.rs
Note: This is not a complete implementation! Intended to eventually solve #17.
My idea was to first extend the existing SVD tests to verify that the singular values were sorted. Once these were failing, I'd go ahead and implement the sort, which should restore them back to a passing state. As it happened, for the existing test data, the SVD algorithm in use already returned sorted singular values, so I constructed some new matrices for which I'd expect the algorithm not to return sorted singular values. However, it seems that for these test matrices, the SVD algorithm does not complete (i.e. it is stuck in an endless loop).
Hence, as long as the SVD algorithm does not finish, I'm also unable to complete the feature in question. I'm not at all familiar with the algorithm in use, so I can't really debug it without spending a lot of time dividing into the algorithm and how it works. Unfortunately, at the moment, that is time I don't have, so I wanted to put the PR up here, in case anybody has any idea how to fix the SVD.
See also commit messages.