Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][Enable gcs scheduler 7/n] Prefer actor owner's node #30789

Merged
merged 10 commits into from
Jan 23, 2023

Conversation

Chong-Li
Copy link
Contributor

@Chong-Li Chong-Li commented Nov 30, 2022

Signed-off-by: Chong-Li lc300133@antgroup.com

Why are these changes needed?

This PR tries to finalize gcs actor scheduler, with the following changes:

  1. Similar to the legacy scheduler, gcs now prefers the actor owner's node. It's usually required by RL cases for better colocation.
  2. The normal task workers report zero CPU resources (instead of the allocated one) when they are blocked.
  3. Similar to the legacy scheduler, gcs now schedules empty-resource actors randomly.
  4. A new release test is added: multiple masters/drivers are creating (slave) actors concurrently. The case is able to expose the difference between centralized (gcs-based, less scheduling conflicts) and distributed schedulers.

The feature flag is temporarily turned on when going through the CI pipline. We still need another dedicated PR to turn it on by default.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Chong-Li <lc300133@antgroup.com>
@Chong-Li Chong-Li marked this pull request as draft November 30, 2022 14:24
Signed-off-by: Chong-Li <lc300133@antgroup.com>
Signed-off-by: Chong-Li <lc300133@antgroup.com>
@Chong-Li Chong-Li marked this pull request as ready for review December 2, 2022 04:05
@Chong-Li
Copy link
Contributor Author

Chong-Li commented Dec 2, 2022

The RLLib release test results with raylet-based scheduler: https://buildkite.com/ray-project/release-tests-pr/builds/22539#_

With gcs-based-scheduler: https://buildkite.com/ray-project/release-tests-pr/builds/22674#_

I did not find notable performance regression. But the variations may need @rkooo567 to double check. cc. @scv119 @iycheng @wumuzi520

@@ -359,7 +359,7 @@ def env_bool(key, default):


def gcs_actor_scheduling_enabled():
return os.environ.get("RAY_gcs_actor_scheduling_enabled") == "true"
return os.environ.get("RAY_gcs_actor_scheduling_enabled") != "false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that gcs actor scheduling is enabled by default?

@@ -602,7 +602,7 @@ RAY_CONFIG(uint64_t, subscriber_timeout_ms, 300 * 1000)
RAY_CONFIG(uint64_t, gcs_actor_table_min_duration_ms, /* 5 min */ 60 * 1000 * 5)

/// Whether to enable GCS-based actor scheduling.
RAY_CONFIG(bool, gcs_actor_scheduling_enabled, false);
RAY_CONFIG(bool, gcs_actor_scheduling_enabled, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that gcs actor scheduling is enabled by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only turned on to go through CI pipeline. Will turn it off after code review.

@@ -49,7 +49,8 @@ GcsActorScheduler::GcsActorScheduler(
void GcsActorScheduler::Schedule(std::shared_ptr<GcsActor> actor) {
RAY_CHECK(actor->GetNodeID().IsNil() && actor->GetWorkerID().IsNil());

if (RayConfig::instance().gcs_actor_scheduling_enabled()) {
if (RayConfig::instance().gcs_actor_scheduling_enabled() &&
!actor->GetCreationTaskSpecification().GetRequiredResources().IsEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is a little weird here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a quick way to make the empty-resource actors go through the legacy logic (randomly select a forward raylet).

Copy link
Contributor

@wumuzi520 wumuzi520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good overall.
BTW, don't forget to change the switch RAY_gcs_actor_scheduling_enabled back.

@Chong-Li
Copy link
Contributor Author

Is there any more suggestions to this PR? @iycheng @rkooo567

Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request change until we revert the gcs actor scheduling enabled by default

I have mainly two questions.

  1. Is there any way to merge "prioritize_local_node "and "preferred_node_id"? I feel like they are the same thing.
  2. Some questions regarding the nightly tests (follow up from comments)

Also I am waiting for the review from @scv119

python/ray/_private/ray_constants.py Outdated Show resolved Hide resolved


def test_max_actors_launch(cpus_per_actor, total_actors, num_masters):
# By default, there are 50 groups, each group has 1 master and 99 slaves.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you run this test? (do you need help from us running this? )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see this slack post (https://ray-distributed.slack.com/archives/C0334JQHG86/p1669131998615139) for the results. Generally, gcs scheduler shows ~2.5X better latency compared to raylet scheduler.

actor_ready_end = perf_counter()
actor_ready_time = actor_ready_end - actor_ready_start

print(f"Actor launch time: {actor_launch_time} ({args.total_actors} actors)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you tell me a bit more about the success / failure criteria of this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is similar to the existing one: https://github.com/ray-project/ray/blob/master/release/nightly_tests/many_nodes_tests/actor_test.py. They both focus on the scheduling latency of a large number of actors.

The only difference from the actor_test.py is that this test simulates a multi-master(driver) scenario. This means (distributed) master actors are creating slave actors concurrently. Please see this discussion (https://ray-distributed.slack.com/archives/C0334JQHG86/p1667468940596309) for details.

timeout: 7200
script: python many_nodes_tests/multi_master_test.py
wait_for_nodes:
num_nodes: 251
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test this in smaller # of nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config is actually the same as the one of actor_test.py. They should have the same scale to make an apple-to-apple comparison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty expensive. Can you comment it out for now? We can re-enable it once we enable GCS actor scheduler by default. (Let's just use it for benchmarking purpose for now.)

@@ -602,7 +602,7 @@ RAY_CONFIG(uint64_t, subscriber_timeout_ms, 300 * 1000)
RAY_CONFIG(uint64_t, gcs_actor_table_min_duration_ms, /* 5 min */ 60 * 1000 * 5)

/// Whether to enable GCS-based actor scheduling.
RAY_CONFIG(bool, gcs_actor_scheduling_enabled, false);
RAY_CONFIG(bool, gcs_actor_scheduling_enabled, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will request for changes until this change is reverted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I was meaning to revert it after code review. But could do it right now.

Signed-off-by: Chong-Li <lc300133@antgroup.com>
@Chong-Li Chong-Li force-pushed the refactor_gcs_sched_7/n branch from 41b3307 to 3e4ed6c Compare January 3, 2023 07:08
Signed-off-by: Chong-Li <lc300133@antgroup.com>
@Chong-Li Chong-Li force-pushed the refactor_gcs_sched_7/n branch from 3e4ed6c to c2308af Compare January 3, 2023 09:47
@Chong-Li
Copy link
Contributor Author

Chong-Li commented Jan 3, 2023

Request change until we revert the gcs actor scheduling enabled by default

I have mainly two questions.

  1. Is there any way to merge "prioritize_local_node "and "preferred_node_id"? I feel like they are the same thing.
  2. Some questions regarding the nightly tests (follow up from comments)

Also I am waiting for the review from @scv119

These two questions have been resolved. Are there any more comments? @rkooo567 @scv119

@Chong-Li Chong-Li requested a review from rkooo567 January 3, 2023 12:24
Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! Last couple comments.!

timeout: 7200
script: python many_nodes_tests/multi_master_test.py
wait_for_nodes:
num_nodes: 251
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty expensive. Can you comment it out for now? We can re-enable it once we enable GCS actor scheduler by default. (Let's just use it for benchmarking purpose for now.)

src/ray/gcs/gcs_server/gcs_actor_scheduler.cc Outdated Show resolved Hide resolved
src/ray/raylet/local_task_manager.cc Show resolved Hide resolved
src/ray/common/task/task.h Show resolved Hide resolved
@@ -191,7 +192,8 @@ void ClusterTaskManager::TryScheduleInfeasibleTask() {
bool is_infeasible;
cluster_resource_scheduler_->GetBestSchedulableNode(
task.GetTaskSpecification(),
work->PrioritizeLocalNode(),
/*preferred_node_id*/ work->PrioritizeLocalNode() ? self_node_id_.Binary()
: task.GetPreferredNodeID(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it work if there's no preferred node id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty preferred node_id is handled specially (if empty, fallback to local node) inside HybridSchedulingPolicy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to just pass nullopt instead? It makes sense, but I think it is pretty confusing from code to understand

Copy link
Contributor Author

@Chong-Li Chong-Li Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, preferred_node_id is just a std::string in ClusterResourceScheduler, RayTask, SchedulingOptions and HybridSchedulingPolicy (empty string by default). Do you mean changing it to std::optional<std::string> globally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested, more comments about the empty string have been added. @rkooo567

@rkooo567 rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 12, 2023
Signed-off-by: Chong-Li <lc300133@antgroup.com>
@Chong-Li Chong-Li requested a review from rkooo567 January 16, 2023 13:53
Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are test failures

@rkooo567
Copy link
Contributor

I started a mac build. Please wait a little!

@Chong-Li
Copy link
Contributor Author

I started a mac build. Please wait a little!

All tests passed. @rkooo567

@Chong-Li Chong-Li requested a review from rkooo567 January 23, 2023 08:38
@rkooo567 rkooo567 merged commit ec3243d into ray-project:master Jan 23, 2023
@rkooo567
Copy link
Contributor

cc @iycheng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants