Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[train] Fix train_multinode_persistence release test #39563

Merged
merged 2 commits into from
Sep 12, 2023

Conversation

justinvyu
Copy link
Contributor

@justinvyu justinvyu commented Sep 11, 2023

Why are these changes needed?

This PR switches from using fsspec to the s3 and gs CLIs to download and delete the directory.

Uses: I want to download the contents of storage_path on cloud in order to assert that the contents are correct. I also want to delete the directory so that consecutive runs don't overlap with each other.

This is why we use CLIs instead of the filesystem implementations:

  1. A pyarrow-wrapped version of s3fs will error on downloading. Fix attempt 1: use the pyarrow default filesystem to download.
  2. The pyarrow default gce filesystem doesn't download all files. Fix attempt 2: Use the unwrapped fsspec filesystem implementations to delete/download files.
  3. The unwrapped fsspec implementation for S3 does not delete nested directories properly. This PR's fix: use the CLI's instead of pyarrow or fsspec. Note that this is not really a concern for users on the "default" cloud path, where they don't specify a custom fsspec filesystem.

Related issue number

#39546

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@matthewdeng matthewdeng merged commit 57e298c into ray-project:master Sep 12, 2023
18 of 20 checks passed
@justinvyu justinvyu deleted the release_test_fix branch September 12, 2023 06:27
@zhe-thoughts zhe-thoughts linked an issue Sep 12, 2023 that may be closed by this pull request
justinvyu added a commit to justinvyu/ray that referenced this pull request Sep 12, 2023
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
GeneDer pushed a commit that referenced this pull request Sep 12, 2023
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Sep 12, 2023
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Release Test train_multinode_persistence.aws failure
2 participants