-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements in Quick-start for Ranking #1014
Conversation
…s classes can be tested). Added target encoding args to preprocessing.py. Added args to keep or filter columns in ranking.py. Documentation was updated.
Documentation preview |
--target_encoding_targets is, all categorical | ||
features will be used. | ||
--target_encoding_targets | ||
Columns (comma-sep) with target columns that will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wont giving multiple targets create issue? you were facing issues for that.. was that fixed? also what about test set needs target column issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I split the targets and create one TargetEncoding op for each to avoid the issue
@@ -0,0 +1,23 @@ | |||
13-06-2023 12:45:41:756 [pid=1014 tid=1014] ERROR cufio-drv:716 nvidia-fs.ko driver not loaded | |||
13-06-2023 12:45:52:861 [pid=1156 tid=1156] ERROR cufio-drv:716 nvidia-fs.ko driver not loaded | |||
13-06-2023 12:57:13:36 [pid=1737 tid=1737] ERROR cufio-drv:716 nvidia-fs.ko driver not loaded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess you might want to remove cufile.log file.
args.target_encoding_features = args.categorical_features | ||
if not args.target_encoding_targets: | ||
args.target_encoding_targets = ( | ||
args.binary_classif_targets + args.regression_targets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you check if a target col is float (not an int) and target encoding works properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gonna add the integration tests in another PR and check for those cases.
@@ -263,11 +301,36 @@ def generate_nvt_workflow_targets(self, client=None): | |||
[Tags.REGRESSION, Tags.TARGET, Tags.BINARY] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why tagged as Binary as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Just removed it.
eval_dataset_preproc.to_parquet( | ||
output_eval_dataset_path, | ||
output_files=args.output_num_partitions, | ||
) | ||
|
||
if args.predict_data_path: | ||
# Adding to predict set dummy target columns that are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not read well. may be rephrase as Adding a dummy target column(s) to the test set to perform target encoding op while this issue ...
@gabrielspmoreira I approved in case you want to merge once you push your final changes. |
…e cases). Adjusted the command line examples
* Adding target encoding features support to quick-start preprocessing * Converting the quick-start for ranking to a Python module (so that its classes can be tested). Added target encoding args to preprocessing.py. Added args to keep or filter columns in ranking.py. Documentation was updated. * Fixed bbut when casting the columns (it was shuffling the cols in some cases). Adjusted the command line examples * Small fix and comment adjustment
This PR adds some improvements to the Quick-start for ranking scripts and documentation
In
preprocessing.py
:--target_encoding_features
,--target_encoding_targets
,--target_encoding_kfold
,--target_encoding_smoothing
.In
ranking.py
:--keep_columns
) or remove (--ignore_columns
) from at dataloading / training / evaluation.This PR also converts those scripts to Python modules, to make it easier to import/extend their classes and to test them.
So now, instead of being run like
python preprocessing.py --args ...
, they need to be run as a Python module, e.g.cd /Merlin/examples/ python -m quick_start.scripts.preproc.preprocessing --args ...