-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass CoreWorker
into wrapped C++ code
#74
Conversation
Ray.jl/src/runtime.jl
Outdated
function GetCoreWorker() | ||
return ray_jll.GetCoreWorker() | ||
# if !isassigned(CORE_WORKER) | ||
# CORE_WORKER[] = ray_jll.GetCoreWorker() | ||
# end | ||
# return CORE_WORKER[] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the write-once version of this function results in most of the Ray.jl test suite failing with:
C++ object of type N3ray4core10CoreWorkerE was deleted
I'll try a couple other things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. If I change our global reference to be a Ref{ray_jll.CoreWorker}
then things work fine. It seems that the CxxRef
doesn't hold onto the allocated memory.
Codecov Report
@@ Coverage Diff @@
## main #74 +/- ##
=======================================
Coverage 68.93% 68.93%
=======================================
Files 6 6
Lines 264 264
=======================================
Hits 182 182
Misses 82 82
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
CI failure will be fixed by #75 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my concern here is that it's not clear what the problem is that this is solving; it introduces some fragility by making the julia side keep track of teh C++ coreworker struct. That might be worth it if there's a clear benefit but from what I can tell right now, there isn't.
Ray.jl/src/runtime.jl
Outdated
Get the current job ID for this worker or driver. Job ID is the id of your Ray drivers that | ||
create tasks. | ||
""" | ||
get_job_id() = ray_jll.ToInt(ray_jll.GetCurrentJobId(GetCoreWorker())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why rename this function? I think it's better to keep the diff clean if there isn't a functional change here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python Ray documentation uses this for the name: https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_context.RuntimeContext.get_job_id.html
Ray.jl/src/runtime.jl
Outdated
const CORE_WORKER = Ref{ray_jll.CoreWorker}() | ||
|
||
function GetCoreWorker() | ||
if !isassigned(CORE_WORKER) | ||
CORE_WORKER[] = ray_jll.GetCoreWorker()[] | ||
end | ||
return CxxRef(CORE_WORKER[]) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pattern for other global refs like the function manager is to set this during initialization and use the ref value directly; is there a good reason to not use that pattern in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see it's because we need to wrap it in a CxxRef
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally the reason for this was to handle the start_worker
case. In the scenario where start_worker
is called we call the blocking call initialize_worker
which is the point at which CoreWorker
is instantiated. Doing this this way allows this case to work. It also ended up being useful when I changed the code to return a CxxRef
.
deps/wrapper.cc
Outdated
.method("GetCurrentTaskId", &ray::core::CoreWorker::GetCurrentTaskId) | ||
.method("put", &put) | ||
.method("get", &get) | ||
.method("submit_task", &submit_task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not clear to me why we are exposing these are methods of this class when they're not defined that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these as methods of this class only to avoid headaches with moving these definitions around. The way we've set these up in Julia these methods are associated with this type so doing this just indicates the relationship here. Definitely optional
Continuing on this thread the C++ code typically just calls After implementing this PR I think what we should do is only pass in a |
Last thing to do here is to move the caching |
RTM again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still a bit nervous about handlign this on the julia side but at least in teh JLL it's confined to a safe space. and as discussed in huddle, if we're not handling it correctly, we don't really understand what we're doing here (in a general sense) so probably better to surface that.
Fix #61