-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark all CUDA ccalls as GC safe #2262
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #2262 +/- ##
==========================================
+ Coverage 71.31% 71.33% +0.02%
==========================================
Files 155 155
Lines 14859 14893 +34
==========================================
+ Hits 10596 10624 +28
- Misses 4263 4269 +6 ☔ View full report in Codecov by Sentry. |
Pushed a macro version, and used that to mark all Running the MWE from #2261 repeatedly on 1.8 doesn't seem to crash though. |
9327071
to
7067f9b
Compare
e6ab9d9
to
986f167
Compare
gc_state = @ccall(jl_gc_unsafe_enter()::Int8) | ||
try | ||
$body | ||
finally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was debating if the try/finally was necessary and decided that it wasn't. Since we can only catch exceptions in gc_unsafe. cc: @vtjnash for a double check on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The try
setup seems a bit risky to be present inside jl_gc_safe_enter
, a catch/finally
already implicitly reset the gc state upon leave (normal or error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The try
happens after jl_gc_unsafe_enter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I don't see how this is reachable, since cfunction already had to do an unsafe-enter to be able to dispatch into the runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see this is for v1.8. That seems okay. I am surprised that is still supported here
3f9e13c
to
4779f9c
Compare
Don't do the ccall change for cuQuantum as it doesn't support the latest cuTENSOR, so it would be tricky maintaining both in tree. [skip julia] [skip cuda]
4779f9c
to
4d453ed
Compare
Using https://docs.julialang.org/en/v1/stdlib/Profile/#Triggered-During-Execution I observed an issue in #2261 where
we acquire a lock in
cuOccupancyMaxPotentialBlockSize
and seemingly get live locked. Other threadssuspend in waiting for GC and no forward progress is being made.
Manually marking
cuOccupancyMaxPotentialBlockSize
as safe to execute GC next to seemingly fixes the issue,we could add a convenience macro for doing this.
Update by @maleadt: I made all ccalls GC safe, made sure callbacks are GC unsafe again.
x-ref JuliaLang/julia#49933