Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Datasets] Split test_dataset.py completely into test_dataset_{consumption,map,all_to_all}.py #33101

Merged
merged 6 commits into from
Mar 7, 2023

Conversation

c21
Copy link
Contributor

@c21 c21 commented Mar 7, 2023

Why are these changes needed?

End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files:

  • test_dataset_consumption.py: the tests for consumption APIs and all misc tests
  • test_dataset_map.py: the tests for map-like transformation APIs
  • test_dataset_all_to_all.py: the tests for all-to-all transformation APIs
  • test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@@ -0,0 +1,145 @@
import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of integration, could we call this test_dataset_ecosystem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericl - no strong preference, renamed.

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Mar 7, 2023
@ericl ericl self-assigned this Mar 7, 2023
c21 added 6 commits March 7, 2023 00:21
Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Cheng Su <scnju13@gmail.com>
… all_to_all}.py

Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Cheng Su <scnju13@gmail.com>
@c21 c21 changed the title [WIP][Datasets] Split test_dataset.py completely into test_dataset_{consumption,map,all_to_all}.py [Datasets] Split test_dataset.py completely into test_dataset_{consumption,map,all_to_all}.py Mar 7, 2023
@c21
Copy link
Contributor Author

c21 commented Mar 7, 2023

Manually verified test_dataset.py has 140 unit tests, and after this PR, no test is missing.

@c21 c21 added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Mar 7, 2023
@@ -0,0 +1,920 @@
import itertools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a file level string to explain what this is covering (as what PR description says)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes make sense. I am thinking of adding a readme in test directory, to explain each file. I want to unbreak CI as soon as possible, and avoid merge conflict with other PR. Can we do it as followup?

Copy link
Contributor

@clarkzinzow clarkzinzow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! test_dataset_consumption.py is a bit of a misnomer since it contains a lot of non-consumption tests (e.g. from_items, range_table, global aggregations, etc.), but we work on further splitting that out as a follow-up.

@clarkzinzow clarkzinzow merged commit 340f7b2 into ray-project:master Mar 7, 2023
@c21
Copy link
Contributor Author

c21 commented Mar 7, 2023

test_dataset_consumption.py is a bit of a misnomer since it contains a lot of non-consumption tests (e.g. from_items, range_table, global aggregations, etc.), but we work on further splitting that out as a follow-up.

Yes agreed. We shall further split it, and have a better directory structure for our test.

@c21 c21 deleted the split-test branch March 7, 2023 18:45
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request Mar 21, 2023
…ption,map,all_to_all}.py (ray-project#33101)

End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files:

test_dataset_consumption.py: the tests for consumption APIs and all misc tests
test_dataset_map.py: the tests for map-like transformation APIs
test_dataset_all_to_all.py: the tests for all-to-all transformation APIs
test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs

Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Jack He <jackhe2345@gmail.com>
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
…ption,map,all_to_all}.py (ray-project#33101)

End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files:

test_dataset_consumption.py: the tests for consumption APIs and all misc tests
test_dataset_map.py: the tests for map-like transformation APIs
test_dataset_all_to_all.py: the tests for all-to-all transformation APIs
test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs

Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
peytondmurray pushed a commit to peytondmurray/ray that referenced this pull request Mar 22, 2023
…ption,map,all_to_all}.py (ray-project#33101)

End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files:

test_dataset_consumption.py: the tests for consumption APIs and all misc tests
test_dataset_map.py: the tests for map-like transformation APIs
test_dataset_all_to_all.py: the tests for all-to-all transformation APIs
test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs

Signed-off-by: Cheng Su <scnju13@gmail.com>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…ption,map,all_to_all}.py (ray-project#33101)

End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files:

test_dataset_consumption.py: the tests for consumption APIs and all misc tests
test_dataset_map.py: the tests for map-like transformation APIs
test_dataset_all_to_all.py: the tests for all-to-all transformation APIs
test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs

Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: elliottower <elliot@elliottower.com>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
…ption,map,all_to_all}.py (ray-project#33101)

End-to-end runtime of test_dataset.py is around test timeout threshold (15 minutes for bazel large test). This PR is to split test_dataset.py completely into 4 separate test files:

test_dataset_consumption.py: the tests for consumption APIs and all misc tests
test_dataset_map.py: the tests for map-like transformation APIs
test_dataset_all_to_all.py: the tests for all-to-all transformation APIs
test_dataset_ecosystem.py: the tests for from/to_modin/dask APIs

Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Jack He <jackhe2345@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants