Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Ray wheels work on GH200 #40816

Merged
merged 6 commits into from
Nov 17, 2023
Merged

Make Ray wheels work on GH200 #40816

merged 6 commits into from
Nov 17, 2023

Conversation

pcmoritz
Copy link
Contributor

Why are these changes needed?

This makes sure the Ray Wheels works on nvidia's GH200 (Grace Hopper) architecture.

Fixes #40815

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

# for why we are setting "--with-lg-page" on non x86 hardware here.
configure_options = ["--disable-static", "--enable-prof"] +
select({
"@platforms//cpu:x86_64": [],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we needt oset --with-lg-page=12?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the existing behavior is fine (on x86, the page size is the same on all the architectures and it will select that on compilation)

@rkooo567
Copy link
Contributor

rkooo567 commented Nov 1, 2023

hmm test_advanced_5 probably related

@rkooo567
Copy link
Contributor

cc @pcmoritz do you need help taking this issue btw? Or do you think you will have bandwidth to merge it?

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Nov 17, 2023

Test memory pressure is very flaky and almost certainly unrelated to this PR (since this PR doesn't change anything on x86_64)

@pcmoritz pcmoritz merged commit 0e2a523 into ray-project:master Nov 17, 2023
2 checks passed
@pcmoritz pcmoritz deleted the ray-gh200 branch November 17, 2023 02:01
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Nov 29, 2023
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core] "Failed to start GCS" / "<jemalloc> Unsupported system page size" when running on certain ARM hardware
3 participants