[WIP] Issue #152 use cholesky factorization #155

andrewcsmith · 2016-10-17T01:17:09Z

This is basically a gigantic rewrite of the GaussianMixtureModel to make it use cholesky factorization instead of the inverse/determinant model. There's lots wrong with this commit stylistically, but for now I'm just hacking at it.

cargo test -- --nocapture gmm_train

I can't seem to figure out why the means of the model aren't separating. They all seem to drift in the same direction. Any thoughts? Anyway, if you know relatively quickly why that might be happening, I'd like to hear it.

andrewcsmith · 2016-10-17T03:01:44Z

I think it's getting closer. It doesn't work super well, but my guess is this is due to the initialization parameters and that's because it keeps getting stuck in a local minimum.

If you run the test a bunch of times, you see that occasionally the log_lik drops < -2.0, and the means start to look much more realistic.

I'll work on KMeans initialization, and hopefully that will help things.

andrewcsmith · 2016-10-17T04:30:38Z

It's pretty clear that adding the k-means initialization step does not help things.

initialized means:
⎡ 0.4962 -0.0198⎤
⎢-0.0147  0.5004⎥
⎣-0.5146 -0.5010⎦

# Final values
means:
⎡-0.0110 -0.0068⎤
⎢-0.0110 -0.0068⎥
⎣-0.0110 -0.0068⎦
log_lik:
-1.4304
cov:
⎡0.2872 0.0798⎤
⎣0.0798 0.2844⎦
cov:
⎡0.2872 0.0798⎤
⎣0.0798 0.2844⎦
cov:
⎡0.2872 0.0798⎤
⎣0.0798 0.2844⎦
test learning::gmm::gmm_train ... ok

For my particular application, I switched back to k-means at the moment. I'll look into this weird convergence bug again later, but for the next week or so I need to do other things.

AtheMathmo · 2016-10-17T08:28:54Z

Thanks for your time on this. I'll take a look through the code and see if I can make sense of anything.

AtheMathmo · 2016-10-17T08:32:04Z

src/learning/gmm.rs

+                let n_features = covariances[0].cols();
+                for covariance in covariances {
+                    for i in 0..covariance.cols() {
+                        if covariance[[i, i]] <= 0.0 {


Probably better to do an c.abs() < eps check here?

Yes, I changed this. Although, I haven't even implemented anything regarding the diagonal covariance yet.

AtheMathmo · 2016-10-17T08:33:43Z

src/learning/gmm.rs

+        // println!("mix_weights: \n{:?}", &self.mix_weights);
+        let log_weights = self.mix_weights.iter().map(|w| w.ln());
+        for (lp, lw) in log_prob.iter_mut().zip(log_weights) {
+            *lp *= lw;


I might be misunderstanding - as we're in log space shouldn't these be added? log(ab) = log(a) + lob(b). Are we not computing log(weight * prob)?

YES! Thanks.

andrewcsmith · 2016-10-17T22:43:43Z

Just pushed the latest joy. It's good! It works quite well. The test uses k-means initialization by default, and I fixed a bug in the covariance calculation that sealed the deal.

It's still not ready to merge, because the design is pretty bad, but at least now there's an implementation to kick things off. I'll work on getting the gmm integration test to actually test things as well.

…do it

andrewcsmith · 2016-10-18T06:55:39Z

I went ahead and improved a load of other things we discussed. It's pretty solid now. Still occasionally fails, but steadily increasing the regularization constant on each failure usually fixed that.

AtheMathmo · 2016-10-18T08:56:52Z

I've had a brief look and things look very solid! I'll hopefully have time to properly review this in the next few hours.

Thanks for all the effort you've put into this, I'm really excited to check it out!

AtheMathmo

I haven't dug into the correctness of the code yet, will do so properly soon. It does look good to me from what I understand though.

AtheMathmo · 2016-10-19T05:18:15Z

src/learning/gmm.rs

+        let ref model_means = self.model_means.as_ref().unwrap();
+        // The log of the determinant for each precision matrix
+        let log_det = precisions.iter().map(|m| m.det().ln());
+        // println!("log_det: {:?}", log_det);


Can we remove this commented line please?

Edit: And the other commented println!s below

AtheMathmo · 2016-10-19T05:20:21Z

src/learning/gmm.rs

-                Ok(cov_mat)
+                acc
+            })
+            + 10.0 * EPSILON;


This seems like black magic to me - I see it in the sklearn implementation but we should try to find some sources for it ourselves. Is it purely for numeric stability?

And a (very) minor point, I think it is better to use f64 and here call f64::EPSILON. I think this makes the constant clearer, otherwise someone may think we have defined EPSILON.

AtheMathmo · 2016-10-19T05:22:16Z

src/learning/gmm.rs

+            vec![Matrix::zeros(inputs.cols(), inputs.cols()); k]);
+        self.model_means = Some(Matrix::zeros(inputs.cols(), k));
+
+        // Initialize responsibitilies and calculate parameters


Just noticed this typo, responsibitilies -> 'responsibilities'

AtheMathmo · 2016-10-19T05:24:54Z

src/learning/gmm.rs

@@ -30,75 +29,82 @@
 //! // Probabilities that each point comes from each Gaussian.
 //! println!("{:?}", post_probs.data());
 //! ```
-use linalg::{Matrix, MatrixSlice, Vector, BaseMatrix, BaseMatrixMut, Axes};
-use rulinalg::utils;
+extern crate rand;


Shouldn't need this as we import the crate in lib.rs. Just use rand should work.

AtheMathmo · 2016-10-19T05:35:45Z

src/learning/gmm.rs

+    }
+
+    fn update_gaussian_parameters(&mut self, inputs: &Matrix<f64>, resp: Matrix<f64>) {
+        self.mix_weights = resp.iter_rows()


Is sum_rows not appropriate here?

AtheMathmo · 2016-10-19T05:38:18Z

src/learning/gmm.rs

+        self.cov_option.update_covars(model_covars, resp, inputs,
+                                      model_means, &self.mix_weights);
+
+        self.mix_weights /= inputs.rows() as f64;


This seems like it should be outside of this function.

AtheMathmo · 2016-10-19T05:48:50Z

Actually I might have spoken too soon. I just ran a gmm simulation - sampling data from 3 gaussian clusters with specified mean and variances to see if it could recover the parameters. Sadly it consistently failed to do so, but not dramatically.

I'll try to get a PR in shortly to your branch to add this as an example in the repo - it's not the prettiest but seems it will be valuable for us developers and hopefully some users.

Adding basic gmm example for verification

andrewcsmith · 2016-10-19T08:15:41Z

Cool. I didn't really do much in the way of a/b testing here. I'll see if I can tweak the parameters to get this to work better, or otherwise try to figure out what the problem is (probably next week though).

AtheMathmo · 2016-10-19T09:06:17Z

Hopefully I'll have some time to dig into it in the next few days. It seems pretty minor so hopefully something will jump out.

AtheMathmo · 2016-10-30T10:17:50Z

src/learning/gmm.rs

-                let covar_det = cov.det();
-                let covar_inv = try!(cov.inverse().map_err(Error::from));
+    /// Solves a system given the Cholesky decomposition of the lower triangular matrix
+    pub fn solve_cholesky_decomposition(mut cov_chol: Matrix<f64>) -> Matrix<f64> {


I've just been looking through this to try and spot the regression. I don't think we need this function, rulinalg exposes solve_l_triangular.

AtheMathmo · 2016-10-30T10:24:33Z

src/learning/gmm.rs

+            // Subtract the mean of each column from the matrix y
+            for col in 0..y.cols() {
+                for row in 0..y.rows() {
+                    y[[row, col]] -= z[[0, col]];


Minor stuff here - this access pattern will incur some cache misses.

We should swap the two loops, so that we access contiguous data in sequence (right now we jump down the columns first). We should also probably use get_unchecked_mut here, so remove the bound checks that we know we don't need.

AtheMathmo · 2016-10-30T11:03:53Z

I finally got a little time to look at this. I think I checked over everything and compared to the scikit implementation - sadly I couldn't spot anything incorrect. It is failing even in the 1d case (as in the example added) so I don't think it is all of the Cholesky stuff. I'll try to find some time to dig around a little more, but we may just have to step through and compare to a working implementation to spot where things go wrong...

andrewcsmith · 2016-11-15T05:42:55Z

I bounced back and forth between the CholeskyFull and Diagonal interpretations, and while they're both wrong it seems like the CholeskyFull has generally higher values for the variance, although it also tends to overestimate the variance of the point at a mean of 0.0.

I'm getting fairly consistent results, but they're just not particularly correct. I believe it's external to the impl CovOption blocks. My guess is that it's somewhere in the update code. It seems that the variances are wildly off, which leads me to believe that the calculation of the model responsibilities is off. That's just a guess.

(I'm only commenting because I've been poking around for an hour or so and want to save these notes before moving on with my life. Sorry if it's not particularly helpful.)

AtheMathmo · 2016-12-02T16:05:38Z

Sat down with this again today.

I've been unable to track down the issue, but agree with you that it is probably the update (or atleast gaussian estimation) code.

I think the best approach to find the issue is to write unit tests for each part of the process using the scikit implementation to help. Basically check that each function is computing what we expect with some dummy input.

The good new is that the new assert_matrix_eq! macro should make this a lot easier to do. I'd like to sit down and get this done but I'm not sure when I'll have time. Maybe by next weekend I could take a look...

AtheMathmo · 2017-02-19T17:29:10Z

I'm going to try to tackle this again when AtheMathmo/rulinalg#150 is merged.

This should make this work a little easier to complete.

andrewcsmith added 5 commits October 16, 2016 18:10

Use const for convergence epsilon

9437fd3

Make new use with_weights

67553dc

bump deps

3fe49dd

Add an integration test for GMM

92af698

Commit current status of GaussianMixtureModel rewrite

3e564f8

andrewcsmith mentioned this pull request Oct 17, 2016

Numerical Instability of GMM #152

Open

3 tasks

andrewcsmith added 2 commits October 16, 2016 18:56

Use mean of log_prob_norm for convergence

d40b0bd

Add cov regularization value to stabilize cholesky

06ace3b

andrewcsmith added 2 commits October 16, 2016 20:57

Allow customization of initialization methods

10bafa7

Investigate the initialized means

7db6f0c

AtheMathmo reviewed Oct 17, 2016

View reviewed changes

andrewcsmith added 4 commits October 17, 2016 08:56

Add logs, not multiply

9fd6d36

Test for lower than epsilon rather than le(0)

b3c136b

Fix covariance calculation in update_gaussian_parameters

f79c2f1

Fix lints, fix doctests

918a740

andrewcsmith added 7 commits October 17, 2016 17:12

Reorganize and add the cov regularization variable again

9176560

Testing higher-dimensional data is more likely to fail, so we should …

7adee14

…do it

Remove comments, lower coveriance

2f20142

Test more clusters with GMM

0b7f6d2

Clean up train method

31b20ea

Remove vestigial method initialize_covariances

e54cc04

Move CovOption into a trait/struct design

566c3ed

This was referenced Oct 18, 2016

GMM CovOption redesign #153

Open

Initialize GMM parameters with k-means #150

Open

AtheMathmo reviewed Oct 19, 2016

View reviewed changes

AtheMathmo and others added 2 commits October 19, 2016 08:35

Adding basic gmm example for verification

fe8fa5b

Merge pull request #1 from AtheMathmo/gmm-stability

9b18f1a

Adding basic gmm example for verification

AtheMathmo reviewed Oct 30, 2016

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Issue #152 use cholesky factorization #155

[WIP] Issue #152 use cholesky factorization #155

andrewcsmith commented Oct 17, 2016

andrewcsmith commented Oct 17, 2016

andrewcsmith commented Oct 17, 2016

AtheMathmo commented Oct 17, 2016

AtheMathmo Oct 17, 2016

andrewcsmith Oct 17, 2016

AtheMathmo Oct 17, 2016 •

edited

Loading

andrewcsmith Oct 17, 2016

andrewcsmith commented Oct 17, 2016

andrewcsmith commented Oct 18, 2016

AtheMathmo commented Oct 18, 2016

AtheMathmo left a comment

AtheMathmo Oct 19, 2016

AtheMathmo Oct 19, 2016

AtheMathmo Oct 19, 2016

AtheMathmo Oct 19, 2016

AtheMathmo Oct 19, 2016

AtheMathmo Oct 19, 2016

AtheMathmo commented Oct 19, 2016

andrewcsmith commented Oct 19, 2016

AtheMathmo commented Oct 19, 2016

AtheMathmo Oct 30, 2016

AtheMathmo Oct 30, 2016

AtheMathmo commented Oct 30, 2016

andrewcsmith commented Nov 15, 2016

AtheMathmo commented Dec 2, 2016

AtheMathmo commented Feb 19, 2017

[WIP] Issue #152 use cholesky factorization #155

Are you sure you want to change the base?

[WIP] Issue #152 use cholesky factorization #155

Conversation

andrewcsmith commented Oct 17, 2016

andrewcsmith commented Oct 17, 2016

andrewcsmith commented Oct 17, 2016

AtheMathmo commented Oct 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AtheMathmo Oct 17, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewcsmith commented Oct 17, 2016

andrewcsmith commented Oct 18, 2016

AtheMathmo commented Oct 18, 2016

AtheMathmo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AtheMathmo commented Oct 19, 2016

andrewcsmith commented Oct 19, 2016

AtheMathmo commented Oct 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AtheMathmo commented Oct 30, 2016

andrewcsmith commented Nov 15, 2016

AtheMathmo commented Dec 2, 2016

AtheMathmo commented Feb 19, 2017

AtheMathmo Oct 17, 2016 •

edited

Loading