Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Add opt-in flag for Windows and OSX clusters, update ray start output to match docs #31166

Merged
merged 23 commits into from
Feb 9, 2023

Conversation

stephanie-wang
Copy link
Contributor

@stephanie-wang stephanie-wang commented Dec 16, 2022

Why are these changes needed?

This PR cleans up a few usability issues around Ray clusters:

  1. Makes some cleanups to the ray start log output to match the new documentation on Ray clusters. Mainly, de-emphasize Ray Client and recommend jobs instead.
  2. Add an opt-in flag for enabling multi-node clusters for OSX and Windows. Previously, it was possible to start a multi-node cluster, but then any Ray programs would fail mysteriously after connecting to the cluster. Now, it will warn the user with an error message if the opt-in flag is not set.
  3. Document multi-node support for OSX and Windows.

ray start --head output before this PR:

Local node IP: 10.103.212.102

--------------------
Ray runtime started.
--------------------

Next steps
  To connect to this Ray runtime from another node, run
    ray start --address='10.103.212.102:6379'

  Alternatively, use the following Python code:
    import ray
    ray.init(address='auto')

  To connect to this Ray runtime from outside of the cluster, for example to
  connect to a remote cluster from your laptop directly, use the following
  Python code:
    import ray
    ray.init(address='ray://<head_node_ip_address>:10001')

  To see the status of the cluster, use
    ray status
  To monitor and debug Ray, view the dashboard at
    127.0.0.1:8265

  If connection fails, check your firewall settings and network configuration.

  To terminate the Ray runtime, run
    ray stop

After:

Next steps
  To add another node to this Ray cluster, run
    ray start --address='10.103.212.102:6379'
  
  To connect to this Ray cluster, run `ray.init()` as usual:
    import ray
    ray.init()
  
  To connect to this Ray instance from outside of the cluster, for example 
  when connecting to a remote cluster from your laptop, make sure the
  dashboard (127.0.0.1:8265) is accessible and use Ray jobs. For example:
    RAY_ADDRESS='http://<dashboard URL>' ray job submit --working-dir . -- python my_script.py

  See https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html
  for more information on connecting to the Ray cluster from a remote client.
  
  To see the status of the cluster, use
    ray status
  To monitor and debug Ray, view the dashboard at 
    127.0.0.1:8265
  
  If connection fails, check your firewall settings and network configuration.
  
  To terminate the Ray runtime, run
    ray stop

If on OSX or Windows and RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER is not set:

$ RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=false ray start --head
Local node IP: 127.0.0.1

--------------------
Ray runtime started.
--------------------

Next steps
  Ray clusters are not supported on OSX and Windows.
  If you would like to proceed anyway, restart Ray with:
    ray stop
    RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=true ray start
  
  `RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=true` must also be passed to any Ray clients.
  
  To terminate the Ray runtime, run
    ray stop

$ RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=false python -c "import ray; ray.init()"
2022-12-16 15:41:50,268 INFO worker.py:1356 -- Connecting to existing Ray cluster at address: 127.0.0.1:6379...
2022-12-16 15:43:12,541 WARNING worker.py:1359 -- Ray clusters are not supported on OSX and Windows. If you would like to proceed anyway, rerun with the environment variable `RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=true`.
2022-12-16 15:41:50,273 INFO worker.py:1545 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265 

Related issue number

Closes #30770.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

… clusters

Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
@stephanie-wang stephanie-wang changed the title [core] Add opt-in flag for Windows and OSX clusters, update ray start output to match docs and de-emphasize Ray Client [core] Add opt-in flag for Windows and OSX clusters, update ray start output to match docs Dec 16, 2022
if dashboard_url:
cli_logger.print(
cf.bold(
" RAY_ADDRESS='http://<dashboard URL>' ray job submit "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" RAY_ADDRESS='http://<dashboard URL>' ray job submit "
" RAY_ADDRESS='http://<dashboard URL>:8265' ray job submit "

Not sure if 8265 is technically/customarily part of the URL, but I could see a lot of users leaving it out if we don't include it here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or since we're in the if dashboard_url, could we use {dashboard_url} itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for the heads up! Yeah actually I tried dashboard_url at first, but it defaults to localhost, which is a bit confusing since this part is supposed to be about connecting from a remote client. The {dashboard_url} version of the message won't work out of the box in that scenario.

I can also add the actual dashboard port that was used to this message, by the way.

Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
@pcmoritz
Copy link
Contributor

pcmoritz commented Dec 17, 2022

Thanks for doing this ❤️

Just a small nit: At the moment we have an unholy mix of sometimes 1 being true and sometime "true" being true for environment variables that are flags. It would be good to clean that up going forward (maybe the only way is to allow both 0 and false mean false and 1 and true meaning true for the ones that need clean up, so we are backwards compatible).

It seems at the moment the 0 and 1 convention is more common https://docs.ray.io/en/latest/tune/api_docs/env.html and the other variables in the ray_constants.py file, should we try to standardize around that for now for new environment variables?

The 0 / 1 convention feels a little nicer since there is no problem to decide between "True" and "true" (also I feel like it is the more common convention, but I'm not sure about that).

@ericl
Copy link
Contributor

ericl commented Dec 17, 2022

Add an opt-in flag for enabling multi-node clusters for OSX and Windows

Is there a good reason to document this flag? It seems preferable to raise an exception and just say we do not support this.

@ayl0407
Copy link

ayl0407 commented Dec 18, 2022

Add an opt-in flag for enabling multi-node clusters for OSX and Windows

Is there a good reason to document this flag? It seems preferable to raise an exception and just say we do not support this.

People should be allowed to live dangerously (with a warning of course).

Also.. possibly someone could come along and help make this work for OSX / Windows at some point?

@stephanie-wang
Copy link
Contributor Author

Add an opt-in flag for enabling multi-node clusters for OSX and Windows

Is there a good reason to document this flag? It seems preferable to raise an exception and just say we do not support this.

People should be allowed to live dangerously (with a warning of course).

Also.. possibly someone could come along and help make this work for OSX / Windows at some point?

Yes, this is the reason. We've also had at least two users ask about this on discuss.ray.io, and it seems their only real blocker is #30770.

@stephanie-wang stephanie-wang added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 19, 2022
@stephanie-wang stephanie-wang removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 19, 2022
Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the messages again, I think there is some potential for confusion about what a cluster is. Could we clarify the message to say "Multi-node Ray clusters"?

Also:

  1. I don't think we should print any warning on ray.init()--- this is spammy and probably not actionable if your cluster is already started.
  2. I think we should raise an error when trying to start a worker node on OSX/Windows without the flag set.

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 20, 2022
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot we override the node address so driver doesn't listen to the localhost by default, similar to how we override address with ray start ray start --address=xxx on mac and windows?

@stephanie-wang stephanie-wang added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Feb 9, 2023
@stephanie-wang stephanie-wang merged commit 90f8511 into ray-project:master Feb 9, 2023
@stephanie-wang stephanie-wang deleted the windows-osx-support branch February 9, 2023 17:56
@cadedaniel
Copy link
Member

I think this breaks master #32389

scv119 added a commit that referenced this pull request Feb 10, 2023
…ray start` output to match docs (#31166)"

This reverts commit 90f8511.
stephanie-wang pushed a commit that referenced this pull request Feb 10, 2023
…ray start` output to match docs (#31166)" (#32403)

This reverts commit 90f8511.
@stephanie-wang stephanie-wang restored the windows-osx-support branch February 10, 2023 02:57
rkooo567 added a commit to rkooo567/ray that referenced this pull request Feb 10, 2023
rkooo567 added a commit to rkooo567/ray that referenced this pull request Feb 10, 2023
…update `ray start` output to match docs (ray-project#31166)""

This reverts commit 0566e84.
jjyao pushed a commit that referenced this pull request Feb 14, 2023
… output to match docs (#32409)

Un-revert #31166.

This PR cleans up a few usability issues around Ray clusters:

- Makes some cleanups to the ray start log output to match the new documentation on Ray clusters. Mainly, de-emphasize Ray Client and recommend jobs instead.
- Add an opt-in flag for enabling multi-node clusters for OSX and Windows. Previously, it was possible to start a multi-node cluster, but then any Ray programs would fail mysteriously after connecting to the cluster. Now, it will warn the user with an error message if the opt-in flag is not set.
- Document multi-node support for OSX and Windows.

Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
…t` output to match docs (ray-project#31166)

This PR cleans up a few usability issues around Ray clusters:

    Makes some cleanups to the ray start log output to match the new documentation on Ray clusters. Mainly, de-emphasize Ray Client and recommend jobs instead.
    Add an opt-in flag for enabling multi-node clusters for OSX and Windows. Previously, it was possible to start a multi-node cluster, but then any Ray programs would fail mysteriously after connecting to the cluster. Now, it will warn the user with an error message if the opt-in flag is not set.
    Document multi-node support for OSX and Windows.

Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
…ray start` output to match docs (ray-project#31166)" (ray-project#32403)

This reverts commit 90f8511.

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
… output to match docs (ray-project#32409)

Un-revert ray-project#31166.

This PR cleans up a few usability issues around Ray clusters:

- Makes some cleanups to the ray start log output to match the new documentation on Ray clusters. Mainly, de-emphasize Ray Client and recommend jobs instead.
- Add an opt-in flag for enabling multi-node clusters for OSX and Windows. Previously, it was possible to start a multi-node cluster, but then any Ray programs would fail mysteriously after connecting to the cluster. Now, it will warn the user with an error message if the opt-in flag is not set.
- Document multi-node support for OSX and Windows.

Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
… output to match docs (ray-project#32409)

Un-revert ray-project#31166.

This PR cleans up a few usability issues around Ray clusters:

- Makes some cleanups to the ray start log output to match the new documentation on Ray clusters. Mainly, de-emphasize Ray Client and recommend jobs instead.
- Add an opt-in flag for enabling multi-node clusters for OSX and Windows. Previously, it was possible to start a multi-node cluster, but then any Ray programs would fail mysteriously after connecting to the cluster. Now, it will warn the user with an error message if the opt-in flag is not set.
- Document multi-node support for OSX and Windows.

Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Signed-off-by: elliottower <elliot@elliottower.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[core] Document Windows and OSX support for Ray clusters, support opt-in
9 participants