-
Notifications
You must be signed in to change notification settings - Fork 152
[WIP] Issue #152 use cholesky factorization #155
base: master
Are you sure you want to change the base?
[WIP] Issue #152 use cholesky factorization #155
Conversation
I think it's getting closer. It doesn't work super well, but my guess is this is due to the initialization parameters and that's because it keeps getting stuck in a local minimum. If you run the test a bunch of times, you see that occasionally the I'll work on KMeans initialization, and hopefully that will help things. |
It's pretty clear that adding the k-means initialization step does not help things.
For my particular application, I switched back to k-means at the moment. I'll look into this weird convergence bug again later, but for the next week or so I need to do other things. |
Thanks for your time on this. I'll take a look through the code and see if I can make sense of anything. |
let n_features = covariances[0].cols(); | ||
for covariance in covariances { | ||
for i in 0..covariance.cols() { | ||
if covariance[[i, i]] <= 0.0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to do an c.abs() < eps
check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I changed this. Although, I haven't even implemented anything regarding the diagonal covariance yet.
// println!("mix_weights: \n{:?}", &self.mix_weights); | ||
let log_weights = self.mix_weights.iter().map(|w| w.ln()); | ||
for (lp, lw) in log_prob.iter_mut().zip(log_weights) { | ||
*lp *= lw; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misunderstanding - as we're in log space shouldn't these be added? log(ab) = log(a) + lob(b)
. Are we not computing log(weight * prob)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YES! Thanks.
Just pushed the latest joy. It's good! It works quite well. The test uses k-means initialization by default, and I fixed a bug in the covariance calculation that sealed the deal. It's still not ready to merge, because the design is pretty bad, but at least now there's an implementation to kick things off. I'll work on getting the gmm integration test to actually test things as well. |
I went ahead and improved a load of other things we discussed. It's pretty solid now. Still occasionally fails, but steadily increasing the regularization constant on each failure usually fixed that. |
I've had a brief look and things look very solid! I'll hopefully have time to properly review this in the next few hours. Thanks for all the effort you've put into this, I'm really excited to check it out! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't dug into the correctness of the code yet, will do so properly soon. It does look good to me from what I understand though.
let ref model_means = self.model_means.as_ref().unwrap(); | ||
// The log of the determinant for each precision matrix | ||
let log_det = precisions.iter().map(|m| m.det().ln()); | ||
// println!("log_det: {:?}", log_det); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this commented line please?
Edit: And the other commented println!
s below
Ok(cov_mat) | ||
acc | ||
}) | ||
+ 10.0 * EPSILON; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like black magic to me - I see it in the sklearn implementation but we should try to find some sources for it ourselves. Is it purely for numeric stability?
And a (very) minor point, I think it is better to use f64
and here call f64::EPSILON
. I think this makes the constant clearer, otherwise someone may think we have defined EPSILON
.
vec![Matrix::zeros(inputs.cols(), inputs.cols()); k]); | ||
self.model_means = Some(Matrix::zeros(inputs.cols(), k)); | ||
|
||
// Initialize responsibitilies and calculate parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed this typo, responsibitilies -> 'responsibilities'
@@ -30,75 +29,82 @@ | |||
//! // Probabilities that each point comes from each Gaussian. | |||
//! println!("{:?}", post_probs.data()); | |||
//! ``` | |||
use linalg::{Matrix, MatrixSlice, Vector, BaseMatrix, BaseMatrixMut, Axes}; | |||
use rulinalg::utils; | |||
extern crate rand; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need this as we import the crate in lib.rs
. Just use rand
should work.
} | ||
|
||
fn update_gaussian_parameters(&mut self, inputs: &Matrix<f64>, resp: Matrix<f64>) { | ||
self.mix_weights = resp.iter_rows() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is sum_rows
not appropriate here?
self.cov_option.update_covars(model_covars, resp, inputs, | ||
model_means, &self.mix_weights); | ||
|
||
self.mix_weights /= inputs.rows() as f64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it should be outside of this function.
Actually I might have spoken too soon. I just ran a gmm simulation - sampling data from 3 gaussian clusters with specified mean and variances to see if it could recover the parameters. Sadly it consistently failed to do so, but not dramatically. I'll try to get a PR in shortly to your branch to add this as an example in the repo - it's not the prettiest but seems it will be valuable for us developers and hopefully some users. |
Adding basic gmm example for verification
Cool. I didn't really do much in the way of a/b testing here. I'll see if I can tweak the parameters to get this to work better, or otherwise try to figure out what the problem is (probably next week though). |
Hopefully I'll have some time to dig into it in the next few days. It seems pretty minor so hopefully something will jump out. |
let covar_det = cov.det(); | ||
let covar_inv = try!(cov.inverse().map_err(Error::from)); | ||
/// Solves a system given the Cholesky decomposition of the lower triangular matrix | ||
pub fn solve_cholesky_decomposition(mut cov_chol: Matrix<f64>) -> Matrix<f64> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just been looking through this to try and spot the regression. I don't think we need this function, rulinalg exposes solve_l_triangular
.
// Subtract the mean of each column from the matrix y | ||
for col in 0..y.cols() { | ||
for row in 0..y.rows() { | ||
y[[row, col]] -= z[[0, col]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor stuff here - this access pattern will incur some cache misses.
We should swap the two loops, so that we access contiguous data in sequence (right now we jump down the columns first). We should also probably use get_unchecked_mut
here, so remove the bound checks that we know we don't need.
I finally got a little time to look at this. I think I checked over everything and compared to the scikit implementation - sadly I couldn't spot anything incorrect. It is failing even in the 1d case (as in the example added) so I don't think it is all of the Cholesky stuff. I'll try to find some time to dig around a little more, but we may just have to step through and compare to a working implementation to spot where things go wrong... |
I bounced back and forth between the I'm getting fairly consistent results, but they're just not particularly correct. I believe it's external to the (I'm only commenting because I've been poking around for an hour or so and want to save these notes before moving on with my life. Sorry if it's not particularly helpful.) |
Sat down with this again today. I've been unable to track down the issue, but agree with you that it is probably the update (or atleast gaussian estimation) code. I think the best approach to find the issue is to write unit tests for each part of the process using the scikit implementation to help. Basically check that each function is computing what we expect with some dummy input. The good new is that the new |
I'm going to try to tackle this again when AtheMathmo/rulinalg#150 is merged. This should make this work a little easier to complete. |
This is basically a gigantic rewrite of the
GaussianMixtureModel
to make it use cholesky factorization instead of the inverse/determinant model. There's lots wrong with this commit stylistically, but for now I'm just hacking at it.cargo test -- --nocapture gmm_train
I can't seem to figure out why the means of the model aren't separating. They all seem to drift in the same direction. Any thoughts? Anyway, if you know relatively quickly why that might be happening, I'd like to hear it.