Skip to content

Conversation

@tgaloppo
Copy link
Contributor

@tgaloppo tgaloppo commented Jan 1, 2015

MultivariateGaussian was calling both pinv() and det() on the covariance matrix, effectively performing two matrix decompositions. Both values are now computed using the singular value decompositon. Both the pseudo-inverse and the pseudo-determinant are used to guard against singular matrices.

@SparkQA
Copy link

SparkQA commented Jan 1, 2015

Test build #24987 has started for PR 3871 at commit fd9784c.

  • This patch merges cleanly.

@tgaloppo tgaloppo changed the title SPARK-5017 - Use SVD to compute determinant and inverse of covariance matrix SPARK-5017 [MLlib] - Use SVD to compute determinant and inverse of covariance matrix Jan 1, 2015
@SparkQA
Copy link

SparkQA commented Jan 1, 2015

Test build #24987 has finished for PR 3871 at commit fd9784c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24987/
Test PASSed.

@jkbradley
Copy link
Member

@tgaloppo Could you please add a description? It can be based off of the JIRA, just enough to cover the main points of the PR. Thanks!

… matrix.

Code was calling both det() and pinv(), effectively performing two matrix decompositions.
Futhermore, Breeze pinv() currently fails for singular matrices.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you could add a note here about how this behaves when sigma is singular, plus a reference like [http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Degenerate_case]

The doc could be a short version of what you have below for calculateCovarianceConstants

@jkbradley
Copy link
Member

@tgaloppo @mengxr What are your thoughts about doing the computation in log space as much as possible, and then exponentiating at the end? I'm mainly thinking about numerical stability, but I could imagine wanting to provide pdf() and logpdf() methods eventually.

@jkbradley
Copy link
Member

@tgaloppo The logic looks good; my comments are basically about clarity (except for the log space question). Thanks for the PR!

@jkbradley
Copy link
Member

One more request: Could you please add a unit test with a singular matrix? Thank you! Perhaps in a new suite for MultivariateGaussian

@SparkQA
Copy link

SparkQA commented Jan 2, 2015

Test build #24998 has started for PR 3871 at commit b4415ea.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 2, 2015

Test build #24998 has finished for PR 3871 at commit b4415ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24998/
Test PASSed.

@tgaloppo
Copy link
Contributor Author

tgaloppo commented Jan 3, 2015

@jkbradley I think performing the pdf calculation in log-space (and providing a logpdf() method) is a good idea. Perhaps we can make this part of transitioning MultivariateGaussian to public scope?

…ovariance matrix. Previous code called both pinv() and det(), effectively performing two matrix decompositions.

Additionally, the pinv() implementation in Breeze is known to fail for singular matrices.
@SparkQA
Copy link

SparkQA commented Jan 3, 2015

Test build #25001 has started for PR 3871 at commit d448137.

  • This patch merges cleanly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"eigenvalues" --> "singular values" (here and in the next comment on line 79)

@SparkQA
Copy link

SparkQA commented Jan 3, 2015

Test build #25001 has finished for PR 3871 at commit d448137.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 3, 2015

Test build #25006 has finished for PR 3871 at commit dc3d0f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25006/
Test PASSed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch with the other var; could you please fix this and the test above too? var -> val

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkbradley No problem; fixed.

@jkbradley
Copy link
Member

@tgaloppo thanks for verifying about the test pdf values

@SparkQA
Copy link

SparkQA commented Jan 3, 2015

Test build #25016 has started for PR 3871 at commit 629d9d0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 3, 2015

Test build #25016 has finished for PR 3871 at commit 629d9d0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25016/
Test PASSed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For link, use double-brackets:

(See [[http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Degenerate_case]])

@jkbradley
Copy link
Member

@tgaloppo 2 more small comments, but after those, I believe this will be ready. Thanks!

Fixed comment in MultivariateGaussian
@tgaloppo
Copy link
Contributor Author

tgaloppo commented Jan 4, 2015

@jkbradley Good call on the test suite; I have added some non-center points to the tests. I also added the brackets to the in-comment link.

@SparkQA
Copy link

SparkQA commented Jan 4, 2015

Test build #25038 has started for PR 3871 at commit a5b8bc5.

  • This patch merges cleanly.

@jkbradley
Copy link
Member

@tgaloppo Thanks for the updates! LGTM after tests pass

CC: @mengxr

@SparkQA
Copy link

SparkQA commented Jan 4, 2015

Test build #25038 has finished for PR 3871 at commit a5b8bc5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25038/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can construct BDV and BDM directly.

MultivariateGaussianSuite - Create Breeze vectors and matrices directly instead of through MLlib vectors/matrices.
@tgaloppo
Copy link
Contributor Author

tgaloppo commented Jan 6, 2015

@mengxr Changes made.

@SparkQA
Copy link

SparkQA commented Jan 6, 2015

Test build #25069 has started for PR 3871 at commit 383b5b3.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 6, 2015

Test build #25069 has finished for PR 3871 at commit 383b5b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25069/
Test PASSed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U D^-1 * U.t => U D^(-1/2) D^(-1/2) U.t => (U D^(-1/2)) (D^(-1/2) U.t)
...both are U and D are symmetric, so...
(U D^(-1/2)) = (U.t (D^(-1/2)).t) => (D^(-1/2) U).t
and
(D^(-1/2) U.t) = (D^(-1/2) U)
thus
U D^-1 U.t => (D^(-1/2) U).t (D^(-1/2) U)
... bringing in the delta we get
delta.t (D^(-1/2) U).t (D^(-1/2) U) delta
=> ((D^(-1/2) U) delta).t (D^(-1/2) U) delta = norm(D^(-1/2) U delta)^2
as indicated by @mengxr

(phew, hope I did that OK! :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkbradley Hmm. I am going to punt on this one. Perhaps @mengxr can point out my error. FWIW - Changing the code to U * D^(-1/2) causes the unit tests to fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; I see no reason why U should always be symmetric (as you have demonstrated). Before I roll back the change, I just want to make sure that I did not misinterpret @mengxr , and/or that we are not missing something that he sees.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(u, d) is the eigendecomposition of sigma, so sigma = u * diag(d) * u^-1 ... but we have a special case since covariance matrices are always symmetric and positive semi-definite, in which case u * u.t = I, making it equivalent to the singular value decomposition... so sigma = u * diag(d) * u.t ... so in svd terms, v.t = u.t, then the inverse is v * inv(diag(d)) * u.t = u * inv(diag(d)) * u.t ...

Have I lost my bearings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, no, I have. I was confused about which were inverses, and what you wrote looks perfectly fine. Sorry for the trouble! I'll remove the comments.d

@jkbradley
Copy link
Member

@tgaloppo Now that my confusion is over...LGTM Thanks very much!
CC: @mengxr

@mengxr
Copy link
Contributor

mengxr commented Jan 6, 2015

Merged into master. Thanks!

@asfgit asfgit closed this in 4108e5f Jan 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants