-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Multiprocessing executor for single-node multi-GPU [1/2] #4345
Conversation
This introduces the MultiProcGPUExecutor which uses multiprocessing for tensor parallel as an alternative to ray. This PR does not actually wire it up for use, that will be done in a follow-on PR. This PR includes some refactoring to simplify the executor class hierarchy: - A `MultiGPUExecutor` abstract superclass shared between ray and vanilla multiprocessing implementations - Add a shutdown() method to BaseExecutor abstract class - Simplification/centralization of GPU Worker construction - Move ray_utils.py from engine to executor package - Move function call tracing setup to utils function - Fix various typing things
@lru_cache(maxsize=None) | ||
def get_distributed_init_method() -> str: | ||
ip = get_ip() | ||
port = get_open_port() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it safe to cache here? what if we want to init distributed for two different groups, and we expect the function to return different results per call?
vllm/utils.py
Outdated
@@ -607,3 +614,15 @@ def find_nccl_library(): | |||
raise ValueError("NCCL only supports CUDA and ROCm backends.") | |||
logger.info(f"Found nccl from library {so_file}") | |||
return so_file | |||
|
|||
|
|||
def enable_trace_function_call_for_process(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this is per-thread. the naming is not accurate.
To be honest, I feel like this PR is still too large to be effective reviewed. I suggest to break it down into pieces, each pr having about 100~200 lines of change, so that we can have a quick feedback loop. |
@youkaichao here is the first one: #4347 |
Next one, just introducing the new multi-gpu abstract executor class: #4348 |
Next one, add shutdown method to ExecutorBase: #4349 |
Let's keep 1~2 PRs per day to avoid zhuohan and me being overwhelmed ? |
@youkaichao these are mostly the contents of the original PR that was already reviewed and discussed, just broken up (and mostly very small). Please feel to review at whatever pace you're comfortable with! Next one is just moving the function tracing setup to a util function: #4352 |
Closing this in favor of a set of more granular PRs, some already opened (linked above). |
This introduces the
MultiProcGPUExecutor
which uses multiprocessing for tensor parallel as an alternative to ray.This PR does not actually wire it up for use, that will be done in a follow-on PR.
This PR includes some refactoring to simplify the executor class hierarchy:
MultiGPUExecutor
abstract superclass shared between ray and vanilla multiprocessing implementationsshutdown()
method toBaseExecutor
abstract class, executor is shutdown when the LLMEngine is garbage collectedWorker
constructionray_utils.py
fromengine
toexecutor
package (per @zhuohan123's suggestion)This replaces #3466, see background in that issue.