-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normalize in log scale #25
base: develop
Are you sure you want to change the base?
Conversation
calculate normalization directly in log scale to avoid overflow
log scale normalization
@@ -884,7 +906,7 @@ def transform(self, X, y=None): | |||
if self.normalize: | |||
X = normalize(X) | |||
|
|||
check_is_fitted(self) | |||
check_is_fitted(self, "cluster_centers_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove these changes? check_is_fitted
no longer allows for a specified parameter in sklearn 0.22.
Alternatively, make sure you are working off of the most recent develop commit (I made this change a few days ago).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'll pull the latest changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated sklear to 0.22, now one of the tests fails (test_estimator_spherical_k_means
)
I don't think it's related to the changes I added to test_vmf_log_detect_breakage
as you suggested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need 0.22 and the latest develop branch (which updates things for 0.22)
kappa = 1.0 * np.abs(kappa) | ||
alpha = 1.0 * alpha | ||
beta = 1.0 * np.abs(beta) | ||
kappa = 1. * np.abs(kappa) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about the change in style for each of these floats?
Since this PR is mostly focused on adding the log normalization factor, can we also change these back so the diff can reflect just the salient changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed these changes when I git diff
ed, and I am not sure how they happened, I definitely don't remember making such changes.. besides being unrelated to the log normalization, I don't change style in other peoples code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once I pulled the changes from your latest develop branch, these style changes were back to include the ".0" in floats. I also changed it in the new log-scale method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I probably messed up my local fork somehow. let me check and I'll update soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my check list in the main PR comment area, if you run black
on the file when you are done, it should take care of everything for you. The final diffs for this file should be pretty minimal (just the new function, imports, and change in call site).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I already fixed all style issues, but adding a test method will take some time (I'm pretty busy as the moment for side-work). So in a few days I'll add the test method.
Btw in my fork, does it matter if I add the changes to develop
branch or another (mine) branch? for the PR that is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the branch matters (more of a decision on your end) since they are separate develop branches. Your branch can be named anything like feature/normalize_log_scale
or leaving as develop is fine, if thats simpler for you.
However, I don't see the changes reflect in the PR so you'll need to figure out how to pull it into your existing develop branch or do a new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, thanks so much for helping to make this package better!
To run tests: from the repo directory run python3 -m pytest .
I left some changes that need to be made before I can merge this:
-
Adding a test that tests the diff between
_vmf_normalize_log
and the old_vmf_log
for small-enough values would be good (you can crib fromtest_vmf_log_dense
) -
remove second parameter from
check_is_fitted
as this is no longer supported (alt: make sure you have the latest develop and are working off ofscikit-learn>=0.22
) -
black-ify the file-- to do it: install
black
fromdev-requirements.txt
, then runblack von_mises_fisher_mixture.py
, this will autoformat the file for consistency; this will work in python3 only. (see: https://github.com/psf/black) -
revert change in float style back to x.0; I think though that
black
will do this for you though -
The test
test_vmf_log_detect_breakage
intests/test_von_mises...
needs to be changed since the _vmf_log function scales much better now (last line should change toassert_array_equal(breakage_points,[410, None, None, None, None])
)
allow larger values of kappa and mu without overflow by working in log scale in _vmf_normalize