Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PowerTransformer, QuantileTransformer and KernelCenterer #4755

Merged

Conversation

viclafargue
Copy link
Contributor

Closes #4622

@viclafargue viclafargue requested a review from a team as a code owner May 24, 2022 14:25
@github-actions github-actions bot added the Cython / Python Cython or Python issue label May 24, 2022
@viclafargue viclafargue changed the base branch from branch-22.06 to branch-22.08 May 24, 2022 14:26
@viclafargue viclafargue requested review from a team as code owners May 24, 2022 14:26
@github-actions github-actions bot added CUDA/C++ gpuCI gpuCI issue labels May 30, 2022
@viclafargue viclafargue changed the title PowerTransformer and QuantileTransformer PowerTransformer, QuantileTransformer and KernelCenterer May 30, 2022
@ajschmidt8
Copy link
Member

Can you rebase or merge the latest changes? It seems that we're getting tagged to review code changes that were merged in previous PRs.

@viclafargue
Copy link
Contributor Author

viclafargue commented Jun 2, 2022

Can you rebase or merge the latest changes? It seems that we're getting tagged to review code changes that were merged in previous PRs.

Sorry for that. branch-22.08 appear to be unsynced. I think that it is what is causing the issue. I think it's on its way in #4751.

@ajschmidt8
Copy link
Member

@viclafargue, no problem. I've just resolved the conflicts in that PR and merged it. The branches should be in sync now.

@github-actions github-actions bot removed CUDA/C++ gpuCI gpuCI issue labels Jun 3, 2022
@viclafargue
Copy link
Contributor Author

rerun tests

@ajschmidt8 ajschmidt8 removed the request for review from a team June 23, 2022 18:20
@beckernick
Copy link
Member

This will also close #2035 , right?

@viclafargue
Copy link
Contributor Author

viclafargue commented Jun 29, 2022

This will also close #2035 , right?

Thanks for noticing me of this issue. The version of the QuantileTransformer in this PR does operations in a similar fashion to sklearn but simply uses cuPy instead for acceleration. It looks like the issue you are pointing to is about getting optimal performance through the use of C++ code that has already been written. I think that we should let this issue opene and keep the QuantileTransformer in this PR as temporary solution that can be replaced in the longer term.

@viclafargue viclafargue force-pushed the power-and-quantile-transformers branch from a574900 to 27dab01 Compare July 22, 2022 09:38
Copy link
Contributor

@lowener lowener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can you add them to docs/source/api.rst

python/cuml/_thirdparty/sklearn/preprocessing/_data.py Outdated Show resolved Hide resolved
python/cuml/tests/test_preprocessing.py Outdated Show resolved Hide resolved
@lowener lowener added 3 - Ready for Review Ready for review by team non-breaking Non-breaking change labels Jul 28, 2022
@lowener lowener added the feature request New feature or request label Jul 29, 2022
Copy link
Contributor

@lowener lowener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@dantegd dantegd changed the base branch from branch-22.08 to branch-22.10 August 30, 2022 15:05
Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codeowner approval

self.quantiles_.append(
cpu_np.nanpercentile(np.asnumpy(col), references)
)
self.quantiles_ = cpu_np.transpose(self.quantiles_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing the transpose with NumPy and not CuPy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.quantiles_ data is stored on host, the result of the transpose operation is then again used on host by numpy.maximum.accumulate. I considered that the transposition on device wasn't worth the transfer. Which might arguably be wrong on very large datasets... Do you think we should be using CuPy there knowing this information?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viclafargue it might be good to do benchmarking later to be sure when it could be a bottleneck. In the meantime, I’ll go ahead and merge this PR

@codecov-commenter
Copy link

Codecov Report

Base: 78.02% // Head: 79.68% // Increases project coverage by +1.65% 🎉

Coverage data is based on head (4753ec4) compared to base (7a0ab85).
Patch coverage: 93.79% of modified lines in pull request are covered.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.10    #4755      +/-   ##
================================================
+ Coverage         78.02%   79.68%   +1.65%     
================================================
  Files               180      180              
  Lines             11385    11457      +72     
================================================
+ Hits               8883     9129     +246     
+ Misses             2502     2328     -174     
Flag Coverage Δ
dask 46.24% <39.31%> (+0.02%) ⬆️
non-dask 68.99% <93.79%> (+1.72%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ython/cuml/_thirdparty/sklearn/utils/validation.py 72.54% <ø> (+5.88%) ⬆️
python/cuml/preprocessing/__init__.py 100.00% <ø> (ø)
python/cuml/common/array.py 95.10% <85.10%> (-2.88%) ⬇️
...on/cuml/_thirdparty/sklearn/preprocessing/_data.py 90.01% <96.92%> (+21.09%) ⬆️
...cuml/_thirdparty/sklearn/preprocessing/__init__.py 100.00% <100.00%> (ø)
...cuml/_thirdparty/sklearn/utils/skl_dependencies.py 57.64% <100.00%> (+0.50%) ⬆️
python/cuml/cluster/__init__.py 100.00% <100.00%> (ø)
python/cuml/metrics/__init__.py 100.00% <100.00%> (ø)
python/cuml/testing/test_preproc_utils.py 87.19% <100.00%> (+1.77%) ⬆️
python/cuml/thirdparty_adapters/adapters.py 91.54% <100.00%> (+0.05%) ⬆️
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@dantegd
Copy link
Member

dantegd commented Sep 11, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 74372f3 into rapidsai:branch-22.10 Sep 11, 2022
jakirkham pushed a commit to jakirkham/cuml that referenced this pull request Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Finish porting sklearn preprocessing- PowerTransformer and QuantileTransformer
6 participants