Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified fix for issue 803 #839

Merged
merged 2 commits into from
Dec 7, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,16 @@
from typing import Any

import ray
from ray.experimental.state.api import list_actors
from data_processing.utils import GB, UnrecoverableException
from ray.actor import ActorHandle
from ray.exceptions import RayError
from ray.experimental.state.api import list_actors
from ray.util.actor_pool import ActorPool


RAY_MAX_ACTOR_LIMIT = 10000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider using environment variable as below. This keep the default at 100 (as it is now) but then allows you to set an environment variable to increate the number of actors if needed

Suggested change
RAY_MAX_ACTOR_LIMIT = 10000
RAY_MAX_ACTOR_LIMIT = int (os.environ.get("RAY_MAX_ACTOR_LIMIT", '100'))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a benefit from using an additional environment variable. First, setting 100 as the default means issue #803 will occur by default, requiring users to read the documentation, locate the issue, and define a new environment variable to fix it. Second, Ray already uses an environment variable to define the RAY_MAX_LIMIT_FROM_API_SERVER constant here. Playing with this environment variable might allow us to change the max limit value, but I do not see what is the advantage of doing so.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@touma-I : I can see that @dtsuzuku-ibm has opened an identical issue (#858) and a pull request (#859) with exactly the same fix as proposed here.



class RayUtils:
"""
Class implementing support methods for Ray execution
Expand Down Expand Up @@ -109,11 +112,13 @@ def operator() -> ActorHandle:
time.sleep(creation_delay)
return clazz.options(**actor_options).remote(params)

cls_name = clazz.__class__.__name__.replace('ActorClass(', '').replace(')','')
cls_name = clazz.__class__.__name__.replace("ActorClass(", "").replace(")", "")
actors = [operator() for _ in range(n_actors)]
for i in range(120):
time.sleep(1)
alive = list_actors(filters=[("class_name", "=", cls_name), ("state", "=", "ALIVE")])
alive = list_actors(
filters=[("class_name", "=", cls_name), ("state", "=", "ALIVE")], limit=RAY_MAX_ACTOR_LIMIT
)
if len(actors) == len(alive):
return actors
# failed - raise an exception
Expand Down
Loading