[Misc] Optimize ray worker initialization time #11275

ruisearch42 · 2024-12-18T00:59:14Z

This PR optimizes ray worker initialization time.

In the current code base, ray.get(worker.get_node_ip.remote()) is called for each worker right after we get its handle, and it takes ~3s. This call is expensive because when RayWorkerWrapper.remote() just returns, we get an actor handle, but the actor itself may not be fully initialized yet. At this time, any method call on the actor would need to wait for actor initialization to happen, which can take some time (~3s in this case).

And since we are calling ray.get(worker.get_node_ip.remote()) in a serialized manner for each newly created actor handle, this time adds up. For example, when we have TP=4, this would take ~12 seconds.

We optimize this by making ray.get(worker.get_node_ip.remote()) calls on all the actor handles after they are created. And since these run in parallel, the total time taken is ~3s. So for TP = 4, this reduces ~9 seconds.

I tested the following command:

python3 benchmarks/benchmark_latency.py --model meta-llama/Llama-3.1-8B-Instruct --tensor-parallel-size 4  --num-iters-warmup 5 --num-iters 20  --batch-size 8 --input-len 128 --output-len 256 --max-model-len 2048 --no-enable-prefix-caching --distributed-executor-backend ray

Without this PR, _init_workers_ray takes ~18 seconds. And with it, it takes ~9 seconds.

FIX #10283

github-actions · 2024-12-18T00:59:24Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

comaniac

LGTM

vllm/executor/ray_gpu_executor.py

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

youkaichao

thanks for the fix!

youkaichao · 2024-12-19T07:41:35Z

vllm/executor/ray_gpu_executor.py

@@ -179,7 +188,7 @@ def sort_by_driver_then_worker_ip(worker):
            3. Finally, if the work is on a node with smaller IP address, it
                should be placed first.
            """
-            ip = ray.get(worker.get_node_ip.remote())
+            ip = worker_to_ip[worker]


@ruisearch42 this one looks concerning to me. we should change the tuple to sort, instead of using worker as the key. see the code from #11256

I see. Can you elaborate a bit on the concern? The pattern of using an external dict for sorting is not uncommon.

using an arbitrary python object as a key introduces quite unpredictable behavior and can have silent bugs.

it's not about using an external dict, it's about using the worker object as a dict key, which implicitly calls its __hash__ function.

I think the default behavior without a custom __hash__ function is to use the object's identity (memory address) as __hash__ and __eq__, so it's pretty safe unless there is some non-standard user overridden __hash__ and __eq__?

I think your implementation also makes sense.

Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

ruisearch42 assigned comaniac Dec 18, 2024

comaniac approved these changes Dec 18, 2024

View reviewed changes

vllm/executor/ray_gpu_executor.py Outdated Show resolved Hide resolved

ruisearch42 force-pushed the opt_ray_worker_init branch from dfa2cb8 to 0f453a7 Compare December 18, 2024 01:54

ruisearch42 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 18, 2024

ruisearch42 and others added 3 commits December 18, 2024 16:22

[Misc] Optimize ray worker initialization time

30c4374

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

up

294e710

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

Update vllm/executor/ray_gpu_executor.py

8254b41

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

ruisearch42 force-pushed the opt_ray_worker_init branch from 0f453a7 to 8254b41 Compare December 18, 2024 16:22

comaniac enabled auto-merge (squash) December 18, 2024 16:28

up

918f192

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

auto-merge was automatically disabled December 18, 2024 16:32
Head branch was pushed to by a user without write access

youkaichao approved these changes Dec 19, 2024

View reviewed changes

youkaichao merged commit f26c4ae into vllm-project:main Dec 19, 2024
54 checks passed

youkaichao reviewed Dec 19, 2024

View reviewed changes

ruisearch42 mentioned this pull request Dec 20, 2024

[Bug]: extremely slow launching time possibly due to calling ray.init() again after it has already been called when launching vllm through ray cluster #11208

Open

1 task

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[Misc] Optimize ray worker initialization time (vllm-project#11275)

7bbd9e0

Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] Optimize ray worker initialization time #11275

[Misc] Optimize ray worker initialization time #11275

ruisearch42 commented Dec 18, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 18, 2024

comaniac left a comment

youkaichao left a comment

youkaichao Dec 19, 2024

ruisearch42 Dec 19, 2024

youkaichao Dec 20, 2024

youkaichao Dec 20, 2024

ruisearch42 Dec 21, 2024

[Misc] Optimize ray worker initialization time #11275

[Misc] Optimize ray worker initialization time #11275

Conversation

ruisearch42 commented Dec 18, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 18, 2024

comaniac left a comment

Choose a reason for hiding this comment

youkaichao left a comment

Choose a reason for hiding this comment

youkaichao Dec 19, 2024

Choose a reason for hiding this comment

ruisearch42 Dec 19, 2024

Choose a reason for hiding this comment

youkaichao Dec 20, 2024

Choose a reason for hiding this comment

youkaichao Dec 20, 2024

Choose a reason for hiding this comment

ruisearch42 Dec 21, 2024

Choose a reason for hiding this comment

ruisearch42 commented Dec 18, 2024 •

edited by github-actions bot

Loading