Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] ray.init() hangs using Python 3.10.15 on Linux #48625

Open
btakita opened this issue Nov 7, 2024 · 8 comments
Open

[Core] ray.init() hangs using Python 3.10.15 on Linux #48625

btakita opened this issue Nov 7, 2024 · 8 comments
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. P2 Important issue, but not time-critical

Comments

@btakita
Copy link

btakita commented Nov 7, 2024

What happened + What you expected to happen

  1. ray.init() hangs
  2. ray.init() should start & not hang

ray-logs.zip

Versions / Dependencies

Ray 2.38.0
Python 3.10.15
OS Linux 6.11.6-arch1-1 x86_64 unknown

Reproduction script

import ray.util.queue
ray.init()

Issue Severity

High: It blocks me from completing my task.

@btakita btakita added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 7, 2024
@btakita
Copy link
Author

btakita commented Nov 7, 2024

I have also noticed some processes that I have not figured out how to kill -9.

✗ ps aux | grep python
brian    3197578  0.0  0.1 1932304 105808 pts/10 D    06:52   0:00 python3.10 -m run start
brian    3197660  0.0  0.0      0     0 pts/10   Z    06:52   0:00 [python3.10] <defunct>
brian    3197661  0.0  0.0      0     0 pts/10   Z    06:52   0:00 [python3.10] <defunct>
brian    3202122  0.0  0.1 1932312 105704 pts/10 D    06:53   0:00 python3.10 -m run start
brian    3202205  0.0  0.0      0     0 pts/10   Z    06:53   0:00 [python3.10] <defunct>
brian    3202206  0.0  0.0      0     0 pts/10   Z    06:53   0:00 [python3.10] <defunct>
brian    3212473  0.0  0.1 1932304 105684 pts/10 D    06:55   0:00 python3.10 -m run start
brian    3212553  0.0  0.0      0     0 pts/10   Z    06:55   0:00 [python3.10] <defunct>
brian    3212554  0.0  0.0      0     0 pts/10   Z    06:55   0:00 [python3.10] <defunct>
brian    3218399  0.0  0.1 1932440 105832 pts/10 D    06:56   0:00 python3.10 -m run start
brian    3218804  0.0  0.0      0     0 pts/10   Z    06:56   0:00 [python3.10] <defunct>
brian    3218819  0.0  0.0      0     0 pts/10   Z    06:56   0:00 [python3.10] <defunct>
brian    3225056  0.0  0.1 1932368 106128 pts/10 D    06:58   0:01 python3.10 -m run start
brian    3225138  0.0  0.0      0     0 pts/10   Z    06:58   0:00 [python3.10] <defunct>
brian    3225139  0.0  0.0      0     0 pts/10   Z    06:58   0:00 [python3.10] <defunct>
brian    3324666  0.0  0.0 1883792 89044 pts/10  D    07:16   0:00 python3.10 -m run start
brian    3324750  0.0  0.0      0     0 pts/10   Z    07:16   0:00 [python3.10] <defunct>
brian    3324751  0.0  0.0      0     0 pts/10   Z    07:16   0:00 [python3.10] <defunct>
brian    3344760  0.0  0.0 1883712 88936 pts/10  D+   07:20   0:00 /home/brian/work/livekit/agent/.venv/bin/python -m ray_test
brian    3345025  0.0  0.0      0     0 pts/10   Z+   07:20   0:00 [python] <defunct>
brian    3345026  0.0  0.0      0     0 pts/10   Z+   07:20   0:00 [python] <defunct>
brian    3411983  0.0  0.0   6396  3704 pts/17   S+   07:32   0:00 grep --color=auto python

@jcotant1 jcotant1 added the core Issues that should be addressed in Ray Core label Nov 7, 2024
@Superskyyy
Copy link
Contributor

Can you try a ray stop --force then redo the init? Or start ray from the command line see if it works.

@btakita
Copy link
Author

btakita commented Nov 8, 2024

Thank you for the tip!
I was having power issues on my laptop & my 2nd m.2 drive shut down. I have consolidated my m.2 drives & reinstalled the OS. My laptop seems to be more stable now. If I have this problem independent of the power issues, I'll reopen this GH issue.

@btakita btakita closed this as completed Nov 8, 2024
@btakita
Copy link
Author

btakita commented Nov 8, 2024

A bit more context, the m.2 drive that shut down had the /home directory. All the other root directories in the system, including /tmp were on the m.2 drive that was still running.

I was running BTRFS with the issue & am now running EXT4. I was also having issues with video playback stuttering (buffer running out) & it seems to have gone away with EXT4 on the fresh install.

@btakita
Copy link
Author

btakita commented Nov 9, 2024

I reinstalled my OS & don't have the power issues. But ray.init() is hanging again. So I'm reopening. I didn't run ray stop --force. But next time I will.

It seems to happen after I use a stop the driver using a KeyboardInterrupt then start then stop using KeyboardInterrupt a few times. In a dev environment.

@rynewang
Copy link
Contributor

Hi can you try to create a fresh conda environment and make sure there's no ray process by ray stop --force, then do the ray.init() to test? If it hangs, would you mind to give us logs of the python script, and /tmp/ray?

@rynewang rynewang added P2 Important issue, but not time-critical @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 12, 2024
@btakita
Copy link
Author

btakita commented Nov 13, 2024

It froze again & I ran ray stop --force.
ray-logs.zip

$: ./.venv/bin/ray stop --force
Stopped only 0 out of 3 Ray processes within the grace period 16 seconds. Set `-v` to see more details. Remaining processes [psutil.Process(pid=3773004, name='python3.10', status='zombie', started='22:39:32'), psutil.Process(pid=3773005, name='python3.10', status='zombie', started='22:39:32'), psutil.Process(pid=3772927, name='gcs_server', status='zombie', started='22:39:32')] will be forcefully terminated.
You can also use `--force` to forcefully terminate processes or set higher `--grace-period` to wait longer time for proper termination.

I reran the process & ray.init() hung again.

@j93hahn
Copy link

j93hahn commented Mar 4, 2025

+1, I'm using python 3.10.15 and it hangs on my mac m4:

❯ uv run ipython
Python 3.10.15 (main, Oct 16 2024, 08:33:15) [Clang 18.1.8 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.32.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import ray

In [2]: ray.init()
2025-03-04 11:47:34,281	INFO worker.py:1832 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
Out[5]: RayContext(dashboard_url='127.0.0.1:8265', python_version='3.10.15', ray_version='2.42.1', ray_commit='c2e38f7b75be223c0c033986472daada8622d64f')



I've run uv run ray stop --force and pkill but nothing seems to address this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. P2 Important issue, but not time-critical
Projects
None yet
Development

No branches or pull requests

6 participants