-
-
Notifications
You must be signed in to change notification settings - Fork 25.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add Array API compatibility to MinMaxScaler #26243
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fairly straight forward. We'll need tests to make sure PyTorch works on the CI and gives correct results.
For completeness, inverse_transform
can also be updated with array_api
.
Done and done. For the tests I more or less took the tests that exist for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am uneasy about supporting xp.float16
. Adding a float16
property to _ArrayAPIWrapper
feels like we are "extending the Array API spec".
My preference is to be strict and only allow float16
when the arrays are ndarray
. This means we do not support float16
for numpy.array_api
Arrays, torch
, cupy.array_api
, etc.
Fine for me. Though as far as I can see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I am mostly onboard with supporting float16
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the following and @thomasjpfan's comment on isneginf
, LGTM.
Not sure how to efficiently cope with the lack of a generic isneginf
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond making atol
stricter in the tests and the following suggestions to improve comments / docstrings, LGTM!
sklearn/utils/_array_api.py
Outdated
@@ -357,6 +368,35 @@ def _expit(X): | |||
return 1.0 / (1.0 + xp.exp(-X)) | |||
|
|||
|
|||
def _isneginf(X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO for Tim: open issue in Array API about asymmetry between isinf
and isneginf
I think this PR needs to be updated to leverage the new common test recently merged in |
f28eab5
to
8e8e965
Compare
I think this is ready to go now. I'd like to add more tests a la: @parametrize_with_checks(
[
MinMaxScaler()
],
check_yielder=_yield_array_api_checks,
)
def test_array_api_compliance(check, estimator, request):
check(estimator, check_values=True) but this uses things from #26315. Either we wait to merge this PR until #26315 has landed or we make another PR later to add those tests. I'm fine with either. Maybe waiting for #26315 is cleaner/requires less bookkeeping work. |
I just reverted the change made to Note: the order of the decorators matters for the name of the tests. |
547f487
to
2d690a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Still LGTM.
@thomasjpfan what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the estimator to:
scikit-learn/doc/modules/array_api.rst
Lines 88 to 89 in 2b0eef8
Estimators with support for `Array API`-compatible inputs | |
========================================================= |
I ran this on my local machine with CuPy + PyTorch and the new tests pass.
When using the axis keyword it can happen that a slice contains only NaNs. This change corrects the logic that restores the NaNs at the end. Added tests for 2D inputs.
c9bbd4c
to
beb130a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, otherwise LGTM
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Reference Issues/PRs
Towards #26024
What does this implement/fix? Explain your changes.
This enables
MinMaxScaler
to work with Array API compatible arrays. Most of the changes are replacingnp
withxp
(which represents the namespace the array the user passed in belongs to). Had to implement some helpers likenanmin
andnanmax
.Any other comments?