-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform synchronization on a worker thread #2025
Conversation
# any user will just submit work that makes it block | ||
|
||
# we don't know what the size of uv_thread_t is, so reserve enough space | ||
tid = Ref{NTuple{32, UInt8}}(ntuple(i -> 0, 32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should export an accessor from Julia to get this sizeof
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... 32 bytes ought to be enough for anybody? 😅
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #2025 +/- ##
==========================================
+ Coverage 62.31% 62.55% +0.24%
==========================================
Files 151 152 +1
Lines 12842 12920 +78
==========================================
+ Hits 8002 8082 +80
+ Misses 4840 4838 -2
☔ View full report in Codecov by Sentry. |
Thanks for working on this! I had @aaustin141 rerun the benchmark we were using at JuliaCon. Here is This is pr is a significant improvement: the amount of dead time spent in and after profile But it's still not as good as switching to blocking synchronization, in Here are some times required to run 20 V-cycles of @aaustin141's multigrid code
So, we recommend adding the new non-blocking code but would still like a |
OK, that's too bad. I added a preference to control the synchronization kind, which feels like a more idiomatic way than an environment variable (despite what I said earlier). Does that work too? |
Yes, the preference is fine. Thanks. |
As recommended by NVIDIA, instead of polling the context/stream/event, use a dedicated thread to perform the synchronization on. This is supported on 1.9+, where we have support for foreign threads. It's not particularly fast, 5us per call, but it's significantly better than the previous slow path (which was at least 25us, and could sometimes stall for much longer when the event loop was busy).
TODO: try to improve performance of the core mechanism.
cc @vchuravy
Alternative to #2014; @lcw could you test whether this is acceptable? Note that it requires 1.9.2 or 1.10.