GMM CovOption redesign #153

andrewcsmith · 2016-10-15T01:22:14Z

After looking at the sklearn code and seeing the huge number of possible places that CovOption can come into play, I'd like to propose a redesign of the way we handle various CovOption values.

The basic idea is that this will let us expand the types of CovOption values that we have without changing the match statements, making our API more resilient and allowing other library authors to pass their own values without necessarily needing to have their changes approved and incorporated upstream.

Trait

Create a trait that encompasses the few methods that involve CovOption. This should be something like:

pub trait CovOption {
    /// Initializes the values for GMM, replaces gmm.rs:81
    fn initial_values(inputs: &Matrix<f64>) -> Matrix<f64>;
    /// Compute the covariance, replaces gmm.rs:334
    fn compute_cov(diff: Matrix<f64>, weight: f64);
}

Structs

Create a struct for each variant of our current enum. The code for this should be pretty obvious so I won't repeat it here.

GaussianMixtureModel

Similar to now, it will take a struct S where S: CovOption.

The text was updated successfully, but these errors were encountered:

AtheMathmo · 2016-10-15T06:35:13Z

I think this is a really good suggestion.

There is some difficulty with adding regularization, as this should happen outside of the compute_cov function. We could handle this in a similar way to neural nets - with other functions in the traits like is_regularized. Or we could just create a default, reg_eps function which returns 0, and override it for relevant options.

andrewcsmith · 2016-10-15T16:51:15Z

Sure, sklearn actually handles regularization outside of the covariance type. Instead of using 0, I think it would be valuable to just use an Option<f64> member of the GaussianMixtureModel struct. Then when it comes time to add the regularization constant we just access self.reg_const.unwrap_or(0.0) to be explicit about the default upon unwrapping.

My guess is that this will lead to better optimizations, but who knows.

Why reg_eps? I think it would be more clear to use reg_covar (which sklearn uses) or reg_const (which I think is more descriptive). Epsilon is usually used for tolerance, no?

andrewcsmith · 2016-10-18T06:50:39Z

Check out the latest commits to #155 (566c3ed) to see the design I came up with. I worked on all this at once, more or less in parallel, so this and #152 pretty much depend on each other at this point.

andrewcsmith mentioned this issue Oct 15, 2016

Initialize GMM parameters with k-means #150

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GMM CovOption redesign #153

GMM CovOption redesign #153

andrewcsmith commented Oct 15, 2016

AtheMathmo commented Oct 15, 2016

andrewcsmith commented Oct 15, 2016

andrewcsmith commented Oct 18, 2016

GMM CovOption redesign #153

GMM CovOption redesign #153

Comments

andrewcsmith commented Oct 15, 2016

Trait

Structs

GaussianMixtureModel

AtheMathmo commented Oct 15, 2016

andrewcsmith commented Oct 15, 2016

andrewcsmith commented Oct 18, 2016