[Datasets] Split test_dataset.py completely into test_dataset_{consumption,map,all_to_all}.py #33101

c21 · 2023-03-07T06:34:23Z

Why are these changes needed?

End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files:

test_dataset_consumption.py: the tests for consumption APIs and all misc tests
test_dataset_map.py: the tests for map-like transformation APIs
test_dataset_all_to_all.py: the tests for all-to-all transformation APIs
test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

ericl · 2023-03-07T07:05:24Z

python/ray/data/tests/test_dataset_integration.py

@@ -0,0 +1,145 @@
+import numpy as np


Instead of integration, could we call this test_dataset_ecosystem?

@ericl - no strong preference, renamed.

Signed-off-by: Cheng Su <scnju13@gmail.com>

… all_to_all}.py Signed-off-by: Cheng Su <scnju13@gmail.com>

Signed-off-by: Cheng Su <scnju13@gmail.com>

c21 · 2023-03-07T08:36:02Z

Manually verified test_dataset.py has 140 unit tests, and after this PR, no test is missing.

jianoaix · 2023-03-07T16:59:39Z

python/ray/data/tests/test_dataset_map.py

@@ -0,0 +1,920 @@
+import itertools


Maybe a file level string to explain what this is covering (as what PR description says)?

Yes make sense. I am thinking of adding a readme in test directory, to explain each file. I want to unbreak CI as soon as possible, and avoid merge conflict with other PR. Can we do it as followup?

clarkzinzow

LGTM! test_dataset_consumption.py is a bit of a misnomer since it contains a lot of non-consumption tests (e.g. from_items, range_table, global aggregations, etc.), but we work on further splitting that out as a follow-up.

c21 · 2023-03-07T18:45:22Z

test_dataset_consumption.py is a bit of a misnomer since it contains a lot of non-consumption tests (e.g. from_items, range_table, global aggregations, etc.), but we work on further splitting that out as a follow-up.

Yes agreed. We shall further split it, and have a better directory structure for our test.

…ption,map,all_to_all}.py (ray-project#33101) End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files: test_dataset_consumption.py: the tests for consumption APIs and all misc tests test_dataset_map.py: the tests for map-like transformation APIs test_dataset_all_to_all.py: the tests for all-to-all transformation APIs test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs Signed-off-by: Cheng Su <scnju13@gmail.com> Signed-off-by: Jack He <jackhe2345@gmail.com>

…ption,map,all_to_all}.py (ray-project#33101) End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files: test_dataset_consumption.py: the tests for consumption APIs and all misc tests test_dataset_map.py: the tests for map-like transformation APIs test_dataset_all_to_all.py: the tests for all-to-all transformation APIs test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs Signed-off-by: Cheng Su <scnju13@gmail.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…ption,map,all_to_all}.py (ray-project#33101) End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files: test_dataset_consumption.py: the tests for consumption APIs and all misc tests test_dataset_map.py: the tests for map-like transformation APIs test_dataset_all_to_all.py: the tests for all-to-all transformation APIs test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs Signed-off-by: Cheng Su <scnju13@gmail.com>

…ption,map,all_to_all}.py (ray-project#33101) End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files: test_dataset_consumption.py: the tests for consumption APIs and all misc tests test_dataset_map.py: the tests for map-like transformation APIs test_dataset_all_to_all.py: the tests for all-to-all transformation APIs test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs Signed-off-by: Cheng Su <scnju13@gmail.com> Signed-off-by: elliottower <elliot@elliottower.com>

…ption,map,all_to_all}.py (ray-project#33101) End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files: test_dataset_consumption.py: the tests for consumption APIs and all misc tests test_dataset_map.py: the tests for map-like transformation APIs test_dataset_all_to_all.py: the tests for all-to-all transformation APIs test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs Signed-off-by: Cheng Su <scnju13@gmail.com> Signed-off-by: Jack He <jackhe2345@gmail.com>

c21 requested review from ericl, scv119, clarkzinzow, jjyao and jianoaix as code owners March 7, 2023 06:34

ericl reviewed Mar 7, 2023

View reviewed changes

ericl approved these changes Mar 7, 2023

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Mar 7, 2023

ericl self-assigned this Mar 7, 2023

c21 added 6 commits March 7, 2023 00:21

Disable test in test_dataset.py

ff7756b

Signed-off-by: Cheng Su <scnju13@gmail.com>

Address lint

9a810f9

Signed-off-by: Cheng Su <scnju13@gmail.com>

Try to fix lint

2fee000

Signed-off-by: Cheng Su <scnju13@gmail.com>

Split test_dataset.py completely into test_dataset_{consumption, map,…

1a372a0

… all_to_all}.py Signed-off-by: Cheng Su <scnju13@gmail.com>

Try to fix lint

d69caee

Signed-off-by: Cheng Su <scnju13@gmail.com>

Address comments and fix test failure

e098c6e

Signed-off-by: Cheng Su <scnju13@gmail.com>

c21 force-pushed the split-test branch from 31980d2 to e098c6e Compare March 7, 2023 08:21

c21 changed the title ~~[WIP][Datasets] Split test_dataset.py completely into test_dataset_{consumption,map,all_to_all}.py~~ [Datasets] Split test_dataset.py completely into test_dataset_{consumption,map,all_to_all}.py Mar 7, 2023

c21 assigned clarkzinzow and jianoaix Mar 7, 2023

c21 added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Mar 7, 2023

jianoaix approved these changes Mar 7, 2023

View reviewed changes

clarkzinzow approved these changes Mar 7, 2023

View reviewed changes

clarkzinzow merged commit 340f7b2 into ray-project:master Mar 7, 2023

c21 deleted the split-test branch March 7, 2023 18:45

This was referenced Mar 7, 2023

[attempt 2] Eager import pandas in ray data for python >= 3.7 #33103

Merged

[CI] linux://python/ray/data:tests/test_dataset is failing/flaky on master. #32067

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Datasets] Split test_dataset.py completely into test_dataset_{consumption,map,all_to_all}.py #33101

[Datasets] Split test_dataset.py completely into test_dataset_{consumption,map,all_to_all}.py #33101

c21 commented Mar 7, 2023 •

edited

Loading

ericl Mar 7, 2023

c21 Mar 7, 2023

c21 commented Mar 7, 2023

jianoaix Mar 7, 2023

c21 Mar 7, 2023

clarkzinzow left a comment

c21 commented Mar 7, 2023

[Datasets] Split test_dataset.py completely into test_dataset_{consumption,map,all_to_all}.py #33101

[Datasets] Split test_dataset.py completely into test_dataset_{consumption,map,all_to_all}.py #33101

Conversation

c21 commented Mar 7, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

ericl Mar 7, 2023

Choose a reason for hiding this comment

c21 Mar 7, 2023

Choose a reason for hiding this comment

c21 commented Mar 7, 2023

jianoaix Mar 7, 2023

Choose a reason for hiding this comment

c21 Mar 7, 2023

Choose a reason for hiding this comment

clarkzinzow left a comment

Choose a reason for hiding this comment

c21 commented Mar 7, 2023

c21 commented Mar 7, 2023 •

edited

Loading