Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dataset] Exclude breaking test case in read_parquet_benchmark_single_node release test #31904

Merged
merged 12 commits into from
Jan 28, 2023

Conversation

scottjlee
Copy link
Contributor

@scottjlee scottjlee commented Jan 24, 2023

Signed-off-by: Scott Lee sjl@anyscale.com

Why are these changes needed?

The release test read_parquet_benchmark_single_node fails, due to using Python 3.7 and not having the pickle5 package installed. A similar issue is discussed in #26225. We found that the test failure is contained to the portion which tests a Dataset with a filter expression (the error is related to pickling with this filter expression).

Therefore, we will temporarily disable this portion of the test, while keeping the rest of the release test (which I verified passes on the same cluster). We can come back to this in the future and fix the case with filter. Example of release test successfully running with the filter case removed.

Related issue number

Closes #31888

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Scott Lee <sjl@anyscale.com>
@amogkam
Copy link
Contributor

amogkam commented Jan 25, 2023

should this be added as a ray[data] dependency for python<=3.7?

@cadedaniel
Copy link
Member

Can we get a review of this? This is a release blocker. cc @clarkzinzow @c21 @amogkam @jianoaix

@cadedaniel
Copy link
Member

@scottjlee can you try running this manually (i.e. fire off a release test before merging) to confirm that it fixes the issue?

@amogkam
Copy link
Contributor

amogkam commented Jan 25, 2023

If this is a problem that users will run into, then we should update setup.py and add pickle5 as a dependency for the ray[data] install with python<=3.7 instead of just adding it to the app config

@jianoaix
Copy link
Contributor

Yeah, looks like the test actually exposed a prod issue for users that we need to fix.

Signed-off-by: Scott Lee <sjl@anyscale.com>
@scottjlee
Copy link
Contributor Author

I'm waiting on getting access to AWS roles/credentials to run release tests locally. Is there another way to kickoff release tests without merging? @cadedaniel

In the meantime, @amogkam -- are the changes I made in setup.py what you had in mind?

Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
@scottjlee scottjlee changed the title Add pickle5 to Dataset app_config.yaml to fix Data release test on Python 3.7 [Dataset] Add pickle5 to dependencies for Python < 3.7 to fix release tests Jan 26, 2023
@scottjlee
Copy link
Contributor Author

Looks like the failing release test passes now: https://buildkite.com/ray-project/release-tests-pr/builds/26524

python/setup.py Outdated

data_extras = [numpy_dep, pandas_dep, pyarrow_dep, "fsspec"]
# Need pickle5 package to override default pickle package for Python < 3.7
if sys.version_info < (3, 7):
Copy link
Contributor

@clarkzinzow clarkzinzow Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this should be < (3, 8) right?

Suggested change
if sys.version_info < (3, 7):
if sys.version_info < (3, 8):

Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
@scottjlee
Copy link
Contributor Author

We ran the failing release test on a cluster with the same environment (notably Python 3.7), and determined that the failure is contained to the portion which tests a Dataset with a filter expression (the error is related to pickling with this filter expression).

Therefore, we will temporarily disable this portion of the test, while keeping the rest of the release test (which I verified passes on the same cluster). We can come back to this in the future and fix the case with filter, but should not be a release blocker now.

Copy link
Contributor

@jianoaix jianoaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @scottjlee!

@scottjlee
Copy link
Contributor Author

scottjlee commented Jan 27, 2023

Once the release test run passes, this should be good to merge!
https://buildkite.com/ray-project/release-tests-pr/builds/26708#0185f4d3-2a7f-41a0-88ff-e0596d85433d

@amogkam
Copy link
Contributor

amogkam commented Jan 27, 2023

Thanks @scottjlee! can we update the title and the pr description as well

@scottjlee scottjlee changed the title [Dataset] Add pickle5 to dependencies for Python < 3.7 to fix release tests [Dataset] Exclude breaking test case in read_parquet_benchmark_single_node release test Jan 27, 2023
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
@amogkam amogkam merged commit 675c6a0 into ray-project:master Jan 28, 2023
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
…e_node` release test (ray-project#31904)

The release test read_parquet_benchmark_single_node fails, due to using Python 3.7 and not having the pickle5 package installed. A similar issue is discussed in ray-project#26225. We found that the test failure is contained to the portion which tests a Dataset with a filter expression (the error is related to pickling with this filter expression).

Therefore, we will temporarily disable this portion of the test, while keeping the rest of the release test (which I verified passes on the same cluster). We can come back to this in the future and fix the case with filter. Example of release test successfully running with the filter case removed.

Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Ray 2.3 release] [data] read_parquet_benchmark_single_node release test failure
6 participants