-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: add TSAN to the sanitizers pipelines #42444
Conversation
I'd like to "promote" asan from the experimental pipeline to the regular pipeline. Should tsan be experimental, or can it go in the regular pipeline? |
LGTM |
be40313
to
dcd5024
Compare
I don't have a good sense of how stable TSAN is, since I've never actually made it work locally. That's why I was very eager to add CI, to keep it as close as working possible 😛 @Keno @vtjnash @vchuravy I guess you guys know more about if the experimental pipeline is more appropriate for this or not? |
I don't think we've ever fully been able to run tsan. I think experimental is good for now. Alternatively, we could split it and just have the part that builds the C runtime with tsan in the regular pipeline to ensure that that doesn't regress. Anything beyond that (bootstrapping, maybe running the threads test) can go into the experimental pipeline. |
override CC=$(BINDIR)/clang | ||
override CXX=$(TOOLDIR)/clang++ | ||
|
||
USE_BINARYBUILDER_LLVM=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work. You need to build LLVM with tsan instrumentation since it uses locks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we still use a normal-build LLVM with deadlock:llvm
in the TSAN_OPTIONS=suppressions=...
configuration? Or there's no way to work around this fundamentally?
If we need LLVM with TSAN, I guess we'd need to start adding it in BB before trying this in CI. But it'd be a much bigger project than this quick PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't want to build LLVM from source in the Base Julia CI. We'd need to have an "LLVM with TSAN" JLL available from Ygg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we still use a normal-build LLVM with
deadlock:llvm
in theTSAN_OPTIONS=suppressions=...
configuration? Or there's no way to work around this fundamentally?
It might be possible, but I'd probably recommend against it. I don't know if that can introduce false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so maybe it makes sense to close this PR until a JLL is ready.
Or, maybe we can still just run the build process (-C src
), so that some rudimentary bugs can be caught?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a LLVM_tsan_jll
makes the most sense. Then it only needs to be built very occasionally in Ygg, instead of building LLVM from source on every PR that is opened to JuliaLang/julia.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really want to support "yet" another version of LLVM in Yggdrasil and slow the update cycle down even more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. That's a fair point. Let's see if @staticfloat has any ideas.
At least we can keep the first part, which doesn't require building LLVM from source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the source of slowness in yet another LLVM JLL vs ccache? Aren't they equivalent in terms of machine time? Is it that, with the JLL approach, there need to be humans involved in the process waiting for the JLL and updating the URL/version/checksum? If that's the case, it can also potentially be solved by more automation?
I kinda liked the JLL approach since it'd make it easier for more casual users to try sanitizers. OTOH, I guess the ccache approach would be more hackable for core devs since it'd give us a "healthy" CI'ed build script that is easy to flip some LLVM build options. Maybe the upside of the ccache approach wins, since there's no single set of build options that covers all the needs when you debugging things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it that, with the JLL approach, there need to be humans involved in the process waiting for the JLL and updating the URL/version/checksum? If that's the case, it can also potentially be solved by more automation?
Yes, the maintenance of the LLVM JLL is a lot of human time, some of it waiting, some of it making sure that all the version number bits line up. Sheparding things through to the registry.
I made attempts at trying to automate it, but currently it is pareto optimal in that I find is really annoying, but not annoying enough to descend to arcane wizardy. c.f JuliaPackaging/Yggdrasil#726 (comment)
For now, let's just do the first part, which requires no LLVM build. This will need a rebase on the latest master in order to fix the merge conflicts. |
We should remember to squash-merge when merging this. |
Assuming that the |
@tkf What Julia versions will tsan work on? Should we do the same thing we did for asan, and only run it on 1.8 and later? See the asan configuration for details, but basically you add a conditional of the form: if: | # We only run the `asan` job on Julia 1.8 and later.
(pipeline.slug != "julia-release-1-dot-6") && (pipeline.slug != "julia-release-1-dot-7") |
Yeah, it's #42440 (so 1.8) and later. (We might be able to find some ranges of commits that TSAN could be built but I think it'd require some serious archaeologies.) |
I'm getting this error https://buildkite.com/julialang/julia-master/builds/4307#a956be25-a524-450d-b6f2-3c71b7cfb920/167-238
The error |
Looks like you fixed it? |
Yeah, it looks like |
Do you want to make those fixes in this PR, or a separate PR? |
Oh, I was guessing the reason why Lines 75 to 76 in 5a1b8ce
I think this PR is good to go, unless you want to double-check |
Since we are setting |
(cherry picked from commit f985b47)
(cherry picked from commit f985b47)
(cherry picked from commit f985b47)
(cherry picked from commit f985b47)
(cherry picked from commit f985b47)
This is built on top of #42440 and run
make -C src
with TSAN on CI. It is basically the same as #41530 (plus #41675) but for TSAN instead of ASAN.Making a draft PR against #42440 so that it'd switch the base to
master
once #42440 is merged.