-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[runtime env] Fix Ray hangs when nonexistent conda environment is specified #28105 #34956
[runtime env] Fix Ray hangs when nonexistent conda environment is specified #28105 #34956
Conversation
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
def _create(): | ||
if uri is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it to inside a function because I want get_conda_env_list to run in a separate thread to avoid blocking agent main thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just a couple of nits/questions.
By the way, do you happen to know the impact of this on worker startup time? My impression is that conda activate
already takes a couple seconds, and this check might take several more seconds depending on how many envs there are, but this performance impact still seems worth the correctness tradeoff.
# TODO(architkulkarni): Try "conda activate" here to see if the | ||
# env exists, and raise an exception if it doesn't. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO(architkulkarni): Try "conda activate" here to see if the | |
# env exists, and raise an exception if it doesn't. |
Thanks for fixing this :)
Could you fix the PR title? |
Hmm actually a good point. Maybe I can make it do only once? Maybe we can create a URI that has the state about it. |
I think despite the additional complexity, your URI approach is the correct solution. That's what we do for Note that for production use cases, we don't recommend |
Yeah I think it is a perf regression if we merge it this way. Let me see how complicated it is to support the URI. |
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
@architkulkarni can you check it again? I addressed the problem ^. I think testing is a bit tough (do we have a test that already has conda env?), so I manually verified it, but lmk if you have an idea for testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized a concern about deletion, can you take a look?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is that using time.time()
in the test is risky, it might make the test flaky. I'll let you decide the urgency, given that branch cut is tomorrow.
Failed tests seem unrelated |
…cified ray-project#28105 (ray-project#34956) When a conda name is given to the runtime env, we assume the env already exits. However, there are times the env doesn't exist, and if that happens, it hangs forever. This fixes the issue by always checking conda env list before creating a conda runtime env. Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
…cified ray-project#28105 (ray-project#34956) When a conda name is given to the runtime env, we assume the env already exits. However, there are times the env doesn't exist, and if that happens, it hangs forever. This fixes the issue by always checking conda env list before creating a conda runtime env. Signed-off-by: Victor <vctr.y.m@example.com>
#34956 fixed a conda hanging issue by checking existence of the conda env name. However it only checks by conda env name, not path - if the user specifies an absolute path, it no longer works. This is an regression because it should support full paths without a problem. This PR reads conda CLI for the full paths and names. A special treatment is done for `base` which is just the conda base prefix; and others are base conda_prefix/envs/name. Adds tests for both `base` and full path of it; and also name and full paths for non base envs. Signed-off-by: Ruiyang Wang <rywang014@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Why are these changes needed?
When a conda name is given to the runtime env, we assume the env already exits. However, there are times the env doesn't exist, and if that happens, it hangs forever. This fixes the issue by always checking conda env list before creating a conda runtime env.
Related issue number
Closes #28105
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.