Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cholesky decomposition rewrite: initial work #150

Merged
merged 25 commits into from
Mar 24, 2017

Conversation

Andlon
Copy link
Collaborator

@Andlon Andlon commented Feb 11, 2017

I started some work on rewriting the Cholesky decomposition in the same style as LU. On the way there, I completely rewrote the algorithm using an algorithm from Golub and Van Loan, which describes the decomposition in terms of GAXPY operations. This meant that I can leverage your efficient dot implementation in the utils module. The preliminary benchmarks show a significant improvement (cholesky_nxn is the existing cholesky() implementation):

test linalg::cholesky::cholesky_100x100                                         ... bench:     201,593 ns/iter (+/- 1,126)
test linalg::cholesky::cholesky_500x500                                         ... bench:  21,765,711 ns/iter (+/- 208,204)
test linalg::cholesky::cholesky_decompose_unpack_100x100                        ... bench:      95,016 ns/iter (+/- 337)
test linalg::cholesky::cholesky_decompose_unpack_500x500                        ... bench:   9,607,424 ns/iter (+/- 535,354)

I just thought I'd put this up here, because I think some of this knowledge can also be applied to LU, although the situation is more complicated because of the additional work on U as well as the fact that one has to pivot.

EDIT: Note that this PR includes the entirety of #142! (will rebase after #142 eventually gets merged)

@AtheMathmo
Copy link
Owner

These initial numbers are very promising!

I should have some time to give #142 a final review tomorrow and hopefully this PR too.

Thanks!

@Andlon
Copy link
Collaborator Author

Andlon commented Feb 11, 2017

Take whatever time you need :) This PR is far from done, by the way. I've only wrapped the decomposition itself, but I still plan to provide e.g. solve, inverse and det, as in the case of LU.

@AtheMathmo
Copy link
Owner

#142 has been merged which I think means you need to rebase.

@Andlon
Copy link
Collaborator Author

Andlon commented Feb 18, 2017

Rebasing took more of an effort than I expected, but at least I learned something new in the process. What's remaining is mostly:

  • An inverse() implementation
  • Deprecating .cholesky()
  • Write documentation for the new Cholesky type
  • Update documentation for the decomposition module so that the table of decompositions lists Cholesky as well.

Don't know when I actually have time to do this. I might find myself with almost no time or ample time in some cases. We'll see :)

Copy link
Owner

@AtheMathmo AtheMathmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes so far look good to me!

I think the only thing pending is the old cholesky function deprecation?

@@ -0,0 +1,52 @@
use matrix::BaseMatrixMut;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this module would be better placed in the matrix folder, so that it is src/matrix/utils.rs?

/// with the factor itself. Instead, we may wish to
/// solve linear systems or compute the determinant or the
/// inverse of a symmetric positive definite matrix.
/// In this case, see the next subsections.///
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray /// here

let k_row = a.row(k).raw_slice();
dot(&k_row[0 .. j], &j_row[0 .. j])
};
a[[k, j]] = a[[k, j]] - kj_dot;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to check whether using get_unchecked here improves vectorization (I doubt it).

Copy link
Collaborator Author

@Andlon Andlon Feb 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my other comment. The vast majority of the time should be spent in the kj_dot computation, so I think there's no vectorization to speak of for this single expression (?). However, dot allows for vectorization, and indeed this is where the speed gains demonstrated by the benchmarks come from :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had misunderstood what was going on when I made this comment :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't blame you, this algorithm is not so intuitive because it doesn't follow what you would usually do by hand when doing Gaussian elimination, which basically corresponds to the algorithm based on outer products. However, according to Golub & Van Loan, the Gaxpy-rich version should only involve about half the memory traffic or so of the outer product version.

In the future we can however make it significantly more cache-efficient by using an algorithm based on (blocked) level 3 BLAS routines. This is a strong motivation for #155, because once we have some optimized and efficient BLAS-3 routines, we can significantly accelerate the various decompositions :)

let mut a = matrix;

// Resolve each submatrix (j .. n, j .. n)
for j in 0 .. n {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside this loop I think we might be able to tidy things up in places by using a MatrixSliceMut and get_col_unchecked calls. In theory this should help with vectorization too, but I might be wrong as columnar slice iteration isn't the most efficient.

Another thought - here we are iterating over columns on the outside and rows on the inside. This means all of our inner loops have more cache misses (a[[k, j]] access pattern is worse than a[[j, k]] here). Is it possible to transpose the algorithm so that each inner operation is over column access?

These are just thoughts from someone who has spent 2 minutes reading the code. Please ignore these if not relevant.

Copy link
Collaborator Author

@Andlon Andlon Feb 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I just realized that the comment on line 128 is somewhat misleading. This is a left-looking algorithm based on GAXPY, instead of the usual right-looking algorithm based on outer products, which is probably how most people think of the LU decomposition. In this case it's not quite right to talk about the submatrix (j ..n, j .. n). I will remove this comment to avoid confusion.

If you look closely, you'll see that for each k, we compute the dot product of parts of the kth and jth row. This is where the majority of the time of the algorithm is spent. Since we're computing the dot products of two contiguous slices, this should be fairly cache friendly. Now, you'll also notice that when kj_dot has been computed, the last item in the kth row that was accessed in the dot product is the item with index j - 1, which means that the next item would be j, which corresponds exactly to a[[k, j]]. In other words, at the end of computing kj_dot, a[[k, j]] is probably already in cache due to prefetching.

My understanding is thus that this whole operation should be very cache friendly. It is of course possibly that I've misunderstood some parts :)

Also, I'm not entirely sure if I understand your comment about MatrixSliceMut. Could you perhaps elaborate?

I think the most sensible way to make the code cleaner is to rewrite rows 135 to 143 (the k loop) as a call to a gaxpy function. In fact, this is part of my motivation for #155, because this could be optimized individually and reused in several parts of the library. However, a difficulty here is that the "vector" y in the gaxpy expression y <- y + Ax is not a contiguous slice, but rather a column in a matrix.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining - I agree that the current algorithm is better than a transposed version.

What I meant with the MatrixSliceMut comment is not applicable now that I understand what's going on. I was trying to make the case that if we slice from [j.. , j..] we can get slightly nicer code by doing things like:

*a.col_mut(k) -= kj_dot;

But given that we need to take the whole row of a at line 137-138 I think this would cause borrowing issues. And it only really saves us on that one line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. That would be a good idea in that case. However, I think it's still possible doing something like that using split_at_mut(), but I think perhaps the code doesn't get clearer from this, as you'd have to spend additional lines of code in appeasing the borrow checker.

use libnum::Float;
use libnum::{Zero, Float};

/// Cholesky decomposition.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great documentation here!

@Andlon
Copy link
Collaborator Author

Andlon commented Feb 25, 2017

About the internal_utils module, moving it into matrix as utils is somewhat problematic, because it overlaps with e.g. use utils:: in matrix/mod.rs itself, and seems to create some potential confusion in the code.

I actually think utils should be made entirely private, in which case internal_utils could be merged with utils, but this would be a breaking change.

In any case, I'm not entirely sure what to do about it. Do you have any suggestions, @AtheMathmo?

I think I'm otherwise done with the rest! :)

@AtheMathmo
Copy link
Owner

Sorry for the delay with this.

I agree with what you said above. We can't really move the internal_utils module. I also think that making all of utils private might be the best move. It has been kept private thus far as I needed to use some of the functionality within rusty-machine. Now that the API is a little more developed this probably isn't necessary.

For now I think we should keep things as you have them in this PR - evaluating whether users might need the utils functions could be difficult.

I'll try to find time to give this PR one more look through either later today or tomorrow. From what I remember it should be ready to merge :)

@Andlon
Copy link
Collaborator Author

Andlon commented Mar 7, 2017

No worries!

I just realized I forgot to mention something I wanted to discuss, that I think impacts the merge.

I actually made the solve function return a Vector directly, because if the decomposition is successful, it should always be safe to solve the system. But this assumes that the decomposition and the triangular solve actually use the same criterion for "close to zero". I think they do this at the moment, but it seems a bit fragile. Perhaps to be on the same side we should make solve (and inverse) return Result? This is also more consistent with the LU decomposition API, which is probably a good idea.

@AtheMathmo
Copy link
Owner

I agree that using a Result is probably better. The consistency itself is a big enough motivating factor I think. It seems a good idea to have a Result heavy API with decomposition work anyway - lots of things can go wrong!

@Andlon
Copy link
Collaborator Author

Andlon commented Mar 7, 2017

All right, I'll take care of it as soon as I have time. Might not be before after next week. I'm in the last leg before thesis submission so I'm deliberately trying not to get too distracted by other endeavours at the moment! 😛

@AtheMathmo
Copy link
Owner

No worries. I've been swallowed by school work for the last couple weeks so you have my sympathy. Good luck with the submission!

@Andlon
Copy link
Collaborator Author

Andlon commented Mar 23, 2017

OK, I finally found the time to make the changes to solve and inverse so that they return Result. I suppose this PR should be complete now, barring any mistakes!

Copy link
Owner

@AtheMathmo AtheMathmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small point.

Once we resolved that I am happy to merge :)

}

/// Computes the determinant of the decomposed matrix.
pub fn det(&self) -> T {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if we tried to cholesky decompose an empty matrix here?

From what I can see - we would get an empty matrix within self and this function would return T::one(). I don't think that this is desired behaviour?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. Still, what would you have the determinant of an empty matrix A be? We could define it to be 0, but this would imply that it's singular, and since rank(A) == A.rows() == A.cols(), it is also invertible by the invertible matrix theorem, so we have a rather strange contradiction here. I think the best approach is to define the determinant to be any non-zero number, and so we might as well make it 1 for simplicity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and due to https://en.wikipedia.org/wiki/Empty_product and the usual definition of determinant, one could argue that det(A) == 1 in any case :-)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is a good argument.

Do you think it is worth adding a comment to state our choice here? It seems fairly niche but maybe it will save one person a headache in the future...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree! I will do this shortly.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you add the same comment to the FullPivLU that we have just merged?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly!

@Andlon
Copy link
Collaborator Author

Andlon commented Mar 24, 2017

I think perhaps you may want to delay merging this PR until #160 is in, so that @Prismatica doesn't have to make more tiny adjustments due to potential merge conflicts in decomposition documentation! He's already been forced to make so many small adjustments already, I'd rather I do this for this PR instead (if necessary. Perhaps git merges automatically).

@Andlon
Copy link
Collaborator Author

Andlon commented Mar 24, 2017

OK, I believe the outstanding issues should have been taken care of now!

EDIT: Note that @Prismatica had already added a note to the docs of FullPivLu::det about empty matrices, so I did not need to change that :-) I did update PartialPivLu::det though.

@AtheMathmo
Copy link
Owner

Awesome, thank you!

@AtheMathmo AtheMathmo merged commit 91110a4 into AtheMathmo:master Mar 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants