-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal infrastructure support for ThreadSanitizer #27167
Conversation
One interesting thing is that TSan currently ignores loads and stores to non-zero addressspaces [1]. So to make it useful to Julia code, we will have to fix that.
|
@maleadt We should probably add the function annotation for MSAn/ASAN as well c.f https://reviews.llvm.org/D9631, TSAN didn't sanitize Julia functions due the the abnnotation missing. Current results for Julia usercode: using Base.Threads
function f()
x = Ref[]
@threads for i in 1:10000
x[] += 1
end
return x[]
end
We need to improve the traces so that they are actually useful, and currently I am hitting a weird issue after cleaning up this PR:
|
a393cfe
to
5dc13d9
Compare
Working hypothesis, this is happening after I started sanitizing Julia code (and precompiled a sysimg).
welp, I will have to teach TSAN about Julia then. |
The stack is intentionally corrupt there. Try turning off COPY_STACKS. We still might need to inform TSAN when we allocate a new stack, but at least it’ll never be invalid. |
Without COPY_STACKS I get:
|
@tkf can you try this? This should fix the compilation failures you were seeing. |
@vchuravy Thanks a lot for rebasing this! I tried
With 798bd91 I get
With df54f3b I get
|
Yeah.... I was hoping that the ASAN hooks would work for TSAN as well, but apparently no such joy. |
Yes it looks like we need to use:
|
Ok this now allows me to compile with tsan. There are a bunch of data races reported (which I don't recall from way back when), so we
|
I still get |
The task handling isn't quite right in this PR, but the PR itself is basically fine. Fixing the task handling requires some changes to the task interface that I'd like to do separately, since there's so many different configurations for that, so I'm gonna go ahead and rebase this and get it merged and then put up a separate PR to fix up the task switch annotations. |
@@ -65,6 +65,23 @@ | |||
# define JL_THREAD_LOCAL | |||
#endif | |||
|
|||
// Duplicated from options.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems oddly terrible. Can we delete it from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put it in dtypes.h? We've seem to have accumulated a bunch of these kinds of things there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, for the moment, there's probably no reason not to just put this here. Does raise the question of why jl_task_t is defined in the public header in the first place. We're never passing it by value, so it should probably move to the internals header, but we can do that separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, let me just do that first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, we need the layout information in the public header because locks depend on it for ABI reasons. We should disentangle all of that at some point, but for now, I'll just leave this here and delete it elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks valid. Should we add a note or new issue about adding tsan_destroy_fiber
eventually?
The fiber handling isn't quite right, there are ordering constraints on the tsan switch vs the setjmp that are not correct in this PR. I have it hacked up locally to work, but that requires some refactoring to do cleanly, which I'd like to do in a separate PR. I'll be adding the fiber destroy at the same time. |
Yeah, there's some somewhat similar stuff I wanted to do in #32283 |
- enables building with TSAN for the runtime library as well as Julia code - updates the handling of `SANITIZE=1` in Make.inc - moves sanitizer to late in the pipeline, copies what Clang does - cleans up `options.h`, and `julia_internal.h` w.r.t sanitizers - update devdocs for sanitizer - adds a patch for TSAN to deal with Julia's usage of address spaces - don't use COPY_STACKS with TSAN - don't use DEEPBIND by default if a sanitizer is enabled
- enables building with TSAN for the runtime library as well as Julia code - updates the handling of `SANITIZE=1` in Make.inc - moves sanitizer to late in the pipeline, copies what Clang does - cleans up `options.h`, and `julia_internal.h` w.r.t sanitizers - update devdocs for sanitizer - adds a patch for TSAN to deal with Julia's usage of address spaces - don't use COPY_STACKS with TSAN - don't use DEEPBIND by default if a sanitizer is enabled
Inspired by the current progress on partr, I started looking into what
it takes to have a race detector like Go. This PR is the minimal change
necessary for people to build with thread-sanitizer enabled.
The big open question is how to move forward with this and how to enable
something as simplisitics as
go -race
for multi-threaded Julia code.Current build requirements:
BUILD_LLVM_CLANG=1
)-fsanitize=thread
and-DJL_TSAN_ENABLED=1
toJCFLAGS
and co.I am open for ideas to enable better integration. I was thinking that our
options are:
julia-san
similar tojulia-debug
that has TSAN enabledtsan
is a cpu-feature and enable it in the sysimage behinda feature flag, e.g.
julia -race
would load the sanitized versionof the sysimage. The main challenge here is that the executing binary
needs to have setup code for TSAN, so we couldn't use that from a
non-sanitized Julia
C/C++ TSAN we could build our own and hook that up correctly