Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

[WIP] Issue #152 use cholesky factorization #155

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

andrewcsmith
Copy link
Contributor

This is basically a gigantic rewrite of the GaussianMixtureModel to make it use cholesky factorization instead of the inverse/determinant model. There's lots wrong with this commit stylistically, but for now I'm just hacking at it.

cargo test -- --nocapture gmm_train

I can't seem to figure out why the means of the model aren't separating. They all seem to drift in the same direction. Any thoughts? Anyway, if you know relatively quickly why that might be happening, I'd like to hear it.

@andrewcsmith andrewcsmith mentioned this pull request Oct 17, 2016
3 tasks
@andrewcsmith
Copy link
Contributor Author

I think it's getting closer. It doesn't work super well, but my guess is this is due to the initialization parameters and that's because it keeps getting stuck in a local minimum.

If you run the test a bunch of times, you see that occasionally the log_lik drops < -2.0, and the means start to look much more realistic.

I'll work on KMeans initialization, and hopefully that will help things.

@andrewcsmith
Copy link
Contributor Author

It's pretty clear that adding the k-means initialization step does not help things.

initialized means:
⎡ 0.4962 -0.0198⎤
⎢-0.0147  0.5004⎥
⎣-0.5146 -0.5010⎦

# Final values
means:
⎡-0.0110 -0.0068⎤
⎢-0.0110 -0.0068⎥
⎣-0.0110 -0.0068⎦
log_lik:
-1.4304
cov:
⎡0.2872 0.0798⎤
⎣0.0798 0.2844⎦
cov:
⎡0.2872 0.0798⎤
⎣0.0798 0.2844⎦
cov:
⎡0.2872 0.0798⎤
⎣0.0798 0.2844⎦
test learning::gmm::gmm_train ... ok

For my particular application, I switched back to k-means at the moment. I'll look into this weird convergence bug again later, but for the next week or so I need to do other things.

@AtheMathmo
Copy link
Owner

Thanks for your time on this. I'll take a look through the code and see if I can make sense of anything.

let n_features = covariances[0].cols();
for covariance in covariances {
for i in 0..covariance.cols() {
if covariance[[i, i]] <= 0.0 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to do an c.abs() < eps check here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I changed this. Although, I haven't even implemented anything regarding the diagonal covariance yet.

// println!("mix_weights: \n{:?}", &self.mix_weights);
let log_weights = self.mix_weights.iter().map(|w| w.ln());
for (lp, lw) in log_prob.iter_mut().zip(log_weights) {
*lp *= lw;
Copy link
Owner

@AtheMathmo AtheMathmo Oct 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding - as we're in log space shouldn't these be added? log(ab) = log(a) + lob(b). Are we not computing log(weight * prob)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES! Thanks.

@andrewcsmith
Copy link
Contributor Author

Just pushed the latest joy. It's good! It works quite well. The test uses k-means initialization by default, and I fixed a bug in the covariance calculation that sealed the deal.

It's still not ready to merge, because the design is pretty bad, but at least now there's an implementation to kick things off. I'll work on getting the gmm integration test to actually test things as well.

@andrewcsmith
Copy link
Contributor Author

I went ahead and improved a load of other things we discussed. It's pretty solid now. Still occasionally fails, but steadily increasing the regularization constant on each failure usually fixed that.

@AtheMathmo
Copy link
Owner

I've had a brief look and things look very solid! I'll hopefully have time to properly review this in the next few hours.

Thanks for all the effort you've put into this, I'm really excited to check it out!

Copy link
Owner

@AtheMathmo AtheMathmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't dug into the correctness of the code yet, will do so properly soon. It does look good to me from what I understand though.

let ref model_means = self.model_means.as_ref().unwrap();
// The log of the determinant for each precision matrix
let log_det = precisions.iter().map(|m| m.det().ln());
// println!("log_det: {:?}", log_det);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this commented line please?

Edit: And the other commented println!s below

Ok(cov_mat)
acc
})
+ 10.0 * EPSILON;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like black magic to me - I see it in the sklearn implementation but we should try to find some sources for it ourselves. Is it purely for numeric stability?

And a (very) minor point, I think it is better to use f64 and here call f64::EPSILON. I think this makes the constant clearer, otherwise someone may think we have defined EPSILON.

vec![Matrix::zeros(inputs.cols(), inputs.cols()); k]);
self.model_means = Some(Matrix::zeros(inputs.cols(), k));

// Initialize responsibitilies and calculate parameters
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this typo, responsibitilies -> 'responsibilities'

@@ -30,75 +29,82 @@
//! // Probabilities that each point comes from each Gaussian.
//! println!("{:?}", post_probs.data());
//! ```
use linalg::{Matrix, MatrixSlice, Vector, BaseMatrix, BaseMatrixMut, Axes};
use rulinalg::utils;
extern crate rand;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need this as we import the crate in lib.rs. Just use rand should work.

}

fn update_gaussian_parameters(&mut self, inputs: &Matrix<f64>, resp: Matrix<f64>) {
self.mix_weights = resp.iter_rows()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is sum_rows not appropriate here?

self.cov_option.update_covars(model_covars, resp, inputs,
model_means, &self.mix_weights);

self.mix_weights /= inputs.rows() as f64;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be outside of this function.

@AtheMathmo
Copy link
Owner

Actually I might have spoken too soon. I just ran a gmm simulation - sampling data from 3 gaussian clusters with specified mean and variances to see if it could recover the parameters. Sadly it consistently failed to do so, but not dramatically.

I'll try to get a PR in shortly to your branch to add this as an example in the repo - it's not the prettiest but seems it will be valuable for us developers and hopefully some users.

@andrewcsmith
Copy link
Contributor Author

Cool. I didn't really do much in the way of a/b testing here. I'll see if I can tweak the parameters to get this to work better, or otherwise try to figure out what the problem is (probably next week though).

@AtheMathmo
Copy link
Owner

Hopefully I'll have some time to dig into it in the next few days. It seems pretty minor so hopefully something will jump out.

let covar_det = cov.det();
let covar_inv = try!(cov.inverse().map_err(Error::from));
/// Solves a system given the Cholesky decomposition of the lower triangular matrix
pub fn solve_cholesky_decomposition(mut cov_chol: Matrix<f64>) -> Matrix<f64> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just been looking through this to try and spot the regression. I don't think we need this function, rulinalg exposes solve_l_triangular.

// Subtract the mean of each column from the matrix y
for col in 0..y.cols() {
for row in 0..y.rows() {
y[[row, col]] -= z[[0, col]];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor stuff here - this access pattern will incur some cache misses.

We should swap the two loops, so that we access contiguous data in sequence (right now we jump down the columns first). We should also probably use get_unchecked_mut here, so remove the bound checks that we know we don't need.

@AtheMathmo
Copy link
Owner

I finally got a little time to look at this. I think I checked over everything and compared to the scikit implementation - sadly I couldn't spot anything incorrect. It is failing even in the 1d case (as in the example added) so I don't think it is all of the Cholesky stuff. I'll try to find some time to dig around a little more, but we may just have to step through and compare to a working implementation to spot where things go wrong...

@andrewcsmith
Copy link
Contributor Author

I bounced back and forth between the CholeskyFull and Diagonal interpretations, and while they're both wrong it seems like the CholeskyFull has generally higher values for the variance, although it also tends to overestimate the variance of the point at a mean of 0.0.

I'm getting fairly consistent results, but they're just not particularly correct. I believe it's external to the impl CovOption blocks. My guess is that it's somewhere in the update code. It seems that the variances are wildly off, which leads me to believe that the calculation of the model responsibilities is off. That's just a guess.

(I'm only commenting because I've been poking around for an hour or so and want to save these notes before moving on with my life. Sorry if it's not particularly helpful.)

@AtheMathmo
Copy link
Owner

Sat down with this again today.

I've been unable to track down the issue, but agree with you that it is probably the update (or atleast gaussian estimation) code.

I think the best approach to find the issue is to write unit tests for each part of the process using the scikit implementation to help. Basically check that each function is computing what we expect with some dummy input.

The good new is that the new assert_matrix_eq! macro should make this a lot easier to do. I'd like to sit down and get this done but I'm not sure when I'll have time. Maybe by next weekend I could take a look...

@AtheMathmo
Copy link
Owner

I'm going to try to tackle this again when AtheMathmo/rulinalg#150 is merged.

This should make this work a little easier to complete.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants