-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CPU object for train_test_split
#5873
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall things look great! Love the code cleaning included in the PR as well! Just had one question
|
||
cuda = gpu_only_import_from("numba", "cuda") | ||
|
||
test_array_input_types = ["numba", "cupy"] | ||
test_array_input_types = ["numba", "cupy", "cudf", "pandas"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks great! But the in the tests we're only testing for dataframes, should we add tests for series as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing tests have hardcoded 2-d array inputs that are can't be naturally cast to Series. I parametrize them with 1-d array and a fixture to convert to df / series based on data shape.
/merge |
This PR adds support to CPU objects for
train_test_split
, leveraging theinput conversion tools defined in
input_utils.py
. This PR also addsoutput_to_df_obj_like
API that converts CumlArray back to a series/dataframe,matching metadata from input.
In the meantime, this PR reimplements majority of
train_test_split
bycentralizing indices compute and gather. This reduces the number of kernels
launched, especially in the cases where stratify keys are provided.
Closes #5619