normalize in log scale #25

itamar-dw · 2019-12-30T09:01:46Z

allow larger values of kappa and mu without overflow by working in log scale in _vmf_normalize

calculate normalization directly in log scale to avoid overflow

log scale normalization

jasonlaska · 2019-12-30T16:41:18Z

spherecluster/von_mises_fisher_mixture.py

@@ -884,7 +906,7 @@ def transform(self, X, y=None):
        if self.normalize:
            X = normalize(X)

-        check_is_fitted(self)
+        check_is_fitted(self, "cluster_centers_")


Could you remove these changes? check_is_fitted no longer allows for a specified parameter in sklearn 0.22.

Alternatively, make sure you are working off of the most recent develop commit (I made this change a few days ago).

yes, I'll pull the latest changes.

I updated sklear to 0.22, now one of the tests fails (test_estimator_spherical_k_means)
I don't think it's related to the changes I added to test_vmf_log_detect_breakage as you suggested

You'll need 0.22 and the latest develop branch (which updates things for 0.22)

jasonlaska · 2019-12-30T16:47:16Z

spherecluster/von_mises_fisher_mixture.py

-    kappa = 1.0 * np.abs(kappa)
-    alpha = 1.0 * alpha
-    beta = 1.0 * np.abs(beta)
+    kappa = 1. * np.abs(kappa)


Curious about the change in style for each of these floats?

Since this PR is mostly focused on adding the log normalization factor, can we also change these back so the diff can reflect just the salient changes.

I've noticed these changes when I git diffed, and I am not sure how they happened, I definitely don't remember making such changes.. besides being unrelated to the log normalization, I don't change style in other peoples code.

Once I pulled the changes from your latest develop branch, these style changes were back to include the ".0" in floats. I also changed it in the new log-scale method.

sorry I probably messed up my local fork somehow. let me check and I'll update soon

See my check list in the main PR comment area, if you run black on the file when you are done, it should take care of everything for you. The final diffs for this file should be pretty minimal (just the new function, imports, and change in call site).

Yes I already fixed all style issues, but adding a test method will take some time (I'm pretty busy as the moment for side-work). So in a few days I'll add the test method.
Btw in my fork, does it matter if I add the changes to develop branch or another (mine) branch? for the PR that is

I don't think the branch matters (more of a decision on your end) since they are separate develop branches. Your branch can be named anything like feature/normalize_log_scale or leaving as develop is fine, if thats simpler for you.

However, I don't see the changes reflect in the PR so you'll need to figure out how to pull it into your existing develop branch or do a new PR.

jasonlaska

Looking good, thanks so much for helping to make this package better!

To run tests: from the repo directory run python3 -m pytest .

I left some changes that need to be made before I can merge this:

Adding a test that tests the diff between _vmf_normalize_log and the old _vmf_log for small-enough values would be good (you can crib from test_vmf_log_dense)
remove second parameter from check_is_fitted as this is no longer supported (alt: make sure you have the latest develop and are working off of scikit-learn>=0.22)
black-ify the file-- to do it: install black from dev-requirements.txt, then run black von_mises_fisher_mixture.py, this will autoformat the file for consistency; this will work in python3 only. (see: https://github.com/psf/black)
revert change in float style back to x.0; I think though that black will do this for you though
The test test_vmf_log_detect_breakage in tests/test_von_mises... needs to be changed since the _vmf_log function scales much better now (last line should change to assert_array_equal(breakage_points,[410, None, None, None, None]))

itamar-dw added 2 commits December 30, 2019 10:38

log scale normalization

80fd279

calculate normalization directly in log scale to avoid overflow

Merge pull request #1 from itamar-dw/log-scale-norm

7c9b81d

log scale normalization

jasonlaska reviewed Dec 30, 2019

View reviewed changes

jasonlaska requested changes Dec 30, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

normalize in log scale #25

normalize in log scale #25

itamar-dw commented Dec 30, 2019

jasonlaska Dec 30, 2019

itamar-dw Dec 31, 2019

itamar-dw Dec 31, 2019

jasonlaska Dec 31, 2019

jasonlaska Dec 30, 2019

itamar-dw Dec 31, 2019

itamar-dw Dec 31, 2019

itamar-dw Dec 31, 2019

jasonlaska Dec 31, 2019

itamar-dw Jan 1, 2020

jasonlaska Jan 1, 2020

jasonlaska left a comment •

edited

Loading

normalize in log scale #25

Are you sure you want to change the base?

normalize in log scale #25

Conversation

itamar-dw commented Dec 30, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasonlaska left a comment • edited Loading

Choose a reason for hiding this comment

jasonlaska left a comment •

edited

Loading