-
Notifications
You must be signed in to change notification settings - Fork 7k
[core] Ensure client_call_manager_ outlives metrics_agent_client_ in core worker
#58315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…es/fix-ccm-lifetime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the ownership of client_call_manager_ by moving it from CoreWorker to CoreWorkerProcessImpl. This is a good change that correctly ensures client_call_manager_ outlives metrics_agent_client_, fixing a potential use-after-free bug. The changes are mostly correct, but I found a critical issue where std::unique_ptrs are passed to a function expecting references, which will likely cause a compilation error. I also found a placeholder comment that should be improved for clarity.
| /// The core worker instance of this worker process. | ||
| MutexProtected<std::shared_ptr<CoreWorker>> core_worker_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved so that CoreWorker is destroyed before ClientCallManager
jjyao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What tool/test can catch this besides Rust compiler
I'm not actually sure why ASAN doesn't catch it. I am not deeply familiar with ASAN/TSAN |
… in core worker (ray-project#58315) The `metrics_agent_client_` depends on `client_call_manager_`, but previously it was pulling out a reference to it from the core worker, which is not guaranteed to outlive the agent client. Modifying it to keep the `client_call_manager_` as a field of the `core_worker_process` instead. I think we may also need to drain any ongoing RPCs from the `metrics_agent_client_` on shutdown. Leaving that for a future PR. --------- Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
… in core worker (ray-project#58315) The `metrics_agent_client_` depends on `client_call_manager_`, but previously it was pulling out a reference to it from the core worker, which is not guaranteed to outlive the agent client. Modifying it to keep the `client_call_manager_` as a field of the `core_worker_process` instead. I think we may also need to drain any ongoing RPCs from the `metrics_agent_client_` on shutdown. Leaving that for a future PR. --------- Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
… in core worker (ray-project#58315) The `metrics_agent_client_` depends on `client_call_manager_`, but previously it was pulling out a reference to it from the core worker, which is not guaranteed to outlive the agent client. Modifying it to keep the `client_call_manager_` as a field of the `core_worker_process` instead. I think we may also need to drain any ongoing RPCs from the `metrics_agent_client_` on shutdown. Leaving that for a future PR. --------- Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
The
metrics_agent_client_depends onclient_call_manager_, but previously it was pulling out a reference to it from the core worker, which is not guaranteed to outlive the agent client.Modifying it to keep the
client_call_manager_as a field of thecore_worker_processinstead.I think we may also need to drain any ongoing RPCs from the
metrics_agent_client_on shutdown. Leaving that for a future PR.