-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Add "median" to TargetEncoder #4722
Conversation
sync with upstream
sync with upstream
sync with upstream
Sync with upstream
sync with upstream
sync with upstream
merge with upstream
sync with upstream
…rapidsai-branch-22.06
This PR has been labeled |
sync with upstream
Codecov ReportBase: 78.02% // Head: 78.07% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-22.10 #4722 +/- ##
================================================
+ Coverage 78.02% 78.07% +0.04%
================================================
Files 180 180
Lines 11385 11442 +57
================================================
+ Hits 8883 8933 +50
- Misses 2502 2509 +7
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@@ -233,7 +241,7 @@ def _fit_transform(self, x, y, fold_ids): | |||
self.n_folds = min(self.n_folds, len(train)) | |||
train[self.fold_col] = self._make_fold_column(len(train), fold_ids) | |||
|
|||
self.y_stat_val = eval(f'train[self.y_col].{self.stat}()') | |||
self.y_stat_val = get_stat_func(self.stat)(train[self.y_col]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dantegd what do you think of the change here? Thank you.
@gpucibot merge |
This PR enables `TargetEncoder` to encode the `median` of the target column with respect to one or multiple categorical columns. The `for loop` logic used in this PR is not as fast as the previous optimization for `mean` and `var` but it can be easily reused for more stat functions. Authors: - Jiwei Liu (https://github.com/daxiongshu) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4722
This PR enables
TargetEncoder
to encode themedian
of the target column with respect to one or multiple categorical columns. Thefor loop
logic used in this PR is not as fast as the previous optimization formean
andvar
but it can be easily reused for more stat functions.