Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normalize in log scale #25

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Conversation

itamar-dw
Copy link

allow larger values of kappa and mu without overflow by working in log scale in _vmf_normalize

calculate normalization directly in log scale to avoid overflow
@@ -884,7 +906,7 @@ def transform(self, X, y=None):
if self.normalize:
X = normalize(X)

check_is_fitted(self)
check_is_fitted(self, "cluster_centers_")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove these changes? check_is_fitted no longer allows for a specified parameter in sklearn 0.22.

Alternatively, make sure you are working off of the most recent develop commit (I made this change a few days ago).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'll pull the latest changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated sklear to 0.22, now one of the tests fails (test_estimator_spherical_k_means)
I don't think it's related to the changes I added to test_vmf_log_detect_breakage as you suggested

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need 0.22 and the latest develop branch (which updates things for 0.22)

kappa = 1.0 * np.abs(kappa)
alpha = 1.0 * alpha
beta = 1.0 * np.abs(beta)
kappa = 1. * np.abs(kappa)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about the change in style for each of these floats?

Since this PR is mostly focused on adding the log normalization factor, can we also change these back so the diff can reflect just the salient changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed these changes when I git diffed, and I am not sure how they happened, I definitely don't remember making such changes.. besides being unrelated to the log normalization, I don't change style in other peoples code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I pulled the changes from your latest develop branch, these style changes were back to include the ".0" in floats. I also changed it in the new log-scale method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I probably messed up my local fork somehow. let me check and I'll update soon

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my check list in the main PR comment area, if you run black on the file when you are done, it should take care of everything for you. The final diffs for this file should be pretty minimal (just the new function, imports, and change in call site).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I already fixed all style issues, but adding a test method will take some time (I'm pretty busy as the moment for side-work). So in a few days I'll add the test method.
Btw in my fork, does it matter if I add the changes to develop branch or another (mine) branch? for the PR that is

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the branch matters (more of a decision on your end) since they are separate develop branches. Your branch can be named anything like feature/normalize_log_scale or leaving as develop is fine, if thats simpler for you.

However, I don't see the changes reflect in the PR so you'll need to figure out how to pull it into your existing develop branch or do a new PR.

Copy link
Owner

@jasonlaska jasonlaska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks so much for helping to make this package better!

To run tests: from the repo directory run python3 -m pytest .

I left some changes that need to be made before I can merge this:

  • Adding a test that tests the diff between _vmf_normalize_log and the old _vmf_log for small-enough values would be good (you can crib from test_vmf_log_dense)

  • remove second parameter from check_is_fitted as this is no longer supported (alt: make sure you have the latest develop and are working off of scikit-learn>=0.22)

  • black-ify the file-- to do it: install black from dev-requirements.txt, then run black von_mises_fisher_mixture.py, this will autoformat the file for consistency; this will work in python3 only. (see: https://github.com/psf/black)

  • revert change in float style back to x.0; I think though that black will do this for you though

  • The test test_vmf_log_detect_breakage in tests/test_von_mises... needs to be changed since the _vmf_log function scales much better now (last line should change to assert_array_equal(breakage_points,[410, None, None, None, None]))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants