-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datasets] Split test_dataset.py completely into test_dataset_{consumption,map,all_to_all}.py #33101
Conversation
@@ -0,0 +1,145 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of integration, could we call this test_dataset_ecosystem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericl - no strong preference, renamed.
Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Cheng Su <scnju13@gmail.com>
… all_to_all}.py Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Cheng Su <scnju13@gmail.com>
Manually verified |
@@ -0,0 +1,920 @@ | |||
import itertools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a file level string to explain what this is covering (as what PR description says)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes make sense. I am thinking of adding a readme in test directory, to explain each file. I want to unbreak CI as soon as possible, and avoid merge conflict with other PR. Can we do it as followup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! test_dataset_consumption.py
is a bit of a misnomer since it contains a lot of non-consumption tests (e.g. from_items
, range_table
, global aggregations, etc.), but we work on further splitting that out as a follow-up.
Yes agreed. We shall further split it, and have a better directory structure for our test. |
…ption,map,all_to_all}.py (ray-project#33101) End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files: test_dataset_consumption.py: the tests for consumption APIs and all misc tests test_dataset_map.py: the tests for map-like transformation APIs test_dataset_all_to_all.py: the tests for all-to-all transformation APIs test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs Signed-off-by: Cheng Su <scnju13@gmail.com> Signed-off-by: Jack He <jackhe2345@gmail.com>
…ption,map,all_to_all}.py (ray-project#33101) End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files: test_dataset_consumption.py: the tests for consumption APIs and all misc tests test_dataset_map.py: the tests for map-like transformation APIs test_dataset_all_to_all.py: the tests for all-to-all transformation APIs test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs Signed-off-by: Cheng Su <scnju13@gmail.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
…ption,map,all_to_all}.py (ray-project#33101) End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files: test_dataset_consumption.py: the tests for consumption APIs and all misc tests test_dataset_map.py: the tests for map-like transformation APIs test_dataset_all_to_all.py: the tests for all-to-all transformation APIs test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs Signed-off-by: Cheng Su <scnju13@gmail.com>
…ption,map,all_to_all}.py (ray-project#33101) End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files: test_dataset_consumption.py: the tests for consumption APIs and all misc tests test_dataset_map.py: the tests for map-like transformation APIs test_dataset_all_to_all.py: the tests for all-to-all transformation APIs test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs Signed-off-by: Cheng Su <scnju13@gmail.com> Signed-off-by: elliottower <elliot@elliottower.com>
…ption,map,all_to_all}.py (ray-project#33101) End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files: test_dataset_consumption.py: the tests for consumption APIs and all misc tests test_dataset_map.py: the tests for map-like transformation APIs test_dataset_all_to_all.py: the tests for all-to-all transformation APIs test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs Signed-off-by: Cheng Su <scnju13@gmail.com> Signed-off-by: Jack He <jackhe2345@gmail.com>
Why are these changes needed?
End-to-end runtime of
test_dataset.py
is around test timeout threshold (15 minutes for bazel large test). This PR is to splittest_dataset.py
completely into 4 separate test files:test_dataset_consumption.py
: the tests for consumption APIs and all misc teststest_dataset_map.py
: the tests for map-like transformation APIstest_dataset_all_to_all.py
: the tests for all-to-all transformation APIstest_dataset_ecosystem.py
: the tests for from/to_modin/dask APIsRelated issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.