Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: add TSAN to the sanitizers pipelines #42444

Merged
merged 6 commits into from
Oct 4, 2021
Merged

CI: add TSAN to the sanitizers pipelines #42444

merged 6 commits into from
Oct 4, 2021

Conversation

tkf
Copy link
Member

@tkf tkf commented Oct 1, 2021

This is built on top of #42440 and run make -C src with TSAN on CI. It is basically the same as #41530 (plus #41675) but for TSAN instead of ASAN.

Making a draft PR against #42440 so that it'd switch the base to master once #42440 is merged.

@tkf tkf added the ci Continuous integration label Oct 1, 2021
@DilumAluthge
Copy link
Member

I'd like to "promote" asan from the experimental pipeline to the regular pipeline.

Should tsan be experimental, or can it go in the regular pipeline?

@Keno
Copy link
Member

Keno commented Oct 1, 2021

LGTM

Base automatically changed from kf/tsanstate to master October 1, 2021 04:44
@tkf tkf force-pushed the tkf/tsan-ci branch 2 times, most recently from be40313 to dcd5024 Compare October 1, 2021 04:51
@tkf tkf marked this pull request as ready for review October 1, 2021 04:52
@tkf tkf requested a review from a team as a code owner October 1, 2021 04:52
@tkf
Copy link
Member Author

tkf commented Oct 1, 2021

I don't have a good sense of how stable TSAN is, since I've never actually made it work locally. That's why I was very eager to add CI, to keep it as close as working possible 😛

@Keno @vtjnash @vchuravy I guess you guys know more about if the experimental pipeline is more appropriate for this or not?

@Keno
Copy link
Member

Keno commented Oct 1, 2021

I don't think we've ever fully been able to run tsan. I think experimental is good for now. Alternatively, we could split it and just have the part that builds the C runtime with tsan in the regular pipeline to ensure that that doesn't regress. Anything beyond that (bootstrapping, maybe running the threads test) can go into the experimental pipeline.

override CC=$(BINDIR)/clang
override CXX=$(TOOLDIR)/clang++

USE_BINARYBUILDER_LLVM=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work. You need to build LLVM with tsan instrumentation since it uses locks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still use a normal-build LLVM with deadlock:llvm in the TSAN_OPTIONS=suppressions=... configuration? Or there's no way to work around this fundamentally?

If we need LLVM with TSAN, I guess we'd need to start adding it in BB before trying this in CI. But it'd be a much bigger project than this quick PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't want to build LLVM from source in the Base Julia CI. We'd need to have an "LLVM with TSAN" JLL available from Ygg.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still use a normal-build LLVM with deadlock:llvm in the TSAN_OPTIONS=suppressions=... configuration? Or there's no way to work around this fundamentally?

It might be possible, but I'd probably recommend against it. I don't know if that can introduce false positives.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so maybe it makes sense to close this PR until a JLL is ready.

Or, maybe we can still just run the build process (-C src), so that some rudimentary bugs can be caught?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a LLVM_tsan_jll makes the most sense. Then it only needs to be built very occasionally in Ygg, instead of building LLVM from source on every PR that is opened to JuliaLang/julia.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want to support "yet" another version of LLVM in Yggdrasil and slow the update cycle down even more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. That's a fair point. Let's see if @staticfloat has any ideas.

At least we can keep the first part, which doesn't require building LLVM from source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the source of slowness in yet another LLVM JLL vs ccache? Aren't they equivalent in terms of machine time? Is it that, with the JLL approach, there need to be humans involved in the process waiting for the JLL and updating the URL/version/checksum? If that's the case, it can also potentially be solved by more automation?

I kinda liked the JLL approach since it'd make it easier for more casual users to try sanitizers. OTOH, I guess the ccache approach would be more hackable for core devs since it'd give us a "healthy" CI'ed build script that is easy to flip some LLVM build options. Maybe the upside of the ccache approach wins, since there's no single set of build options that covers all the needs when you debugging things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that, with the JLL approach, there need to be humans involved in the process waiting for the JLL and updating the URL/version/checksum? If that's the case, it can also potentially be solved by more automation?

Yes, the maintenance of the LLVM JLL is a lot of human time, some of it waiting, some of it making sure that all the version number bits line up. Sheparding things through to the registry.

I made attempts at trying to automate it, but currently it is pareto optimal in that I find is really annoying, but not annoying enough to descend to arcane wizardy. c.f JuliaPackaging/Yggdrasil#726 (comment)

@tkf tkf marked this pull request as draft October 1, 2021 06:13
@DilumAluthge DilumAluthge requested review from staticfloat and removed request for a team October 2, 2021 21:45
@DilumAluthge
Copy link
Member

DilumAluthge commented Oct 3, 2021

For now, let's just do the first part, which requires no LLVM build. This will need a rebase on the latest master in order to fix the merge conflicts.

@tkf tkf marked this pull request as ready for review October 3, 2021 16:59
@tkf tkf requested a review from DilumAluthge October 3, 2021 21:05
@DilumAluthge
Copy link
Member

We should remember to squash-merge when merging this.

@DilumAluthge
Copy link
Member

Assuming that the tsan job passes on Buildkite, this looks good to me to merge.

@DilumAluthge
Copy link
Member

@tkf What Julia versions will tsan work on? Should we do the same thing we did for asan, and only run it on 1.8 and later? See the asan configuration for details, but basically you add a conditional of the form:

    if: | # We only run the `asan` job on Julia 1.8 and later.
      (pipeline.slug != "julia-release-1-dot-6") && (pipeline.slug != "julia-release-1-dot-7")

@tkf
Copy link
Member Author

tkf commented Oct 3, 2021

Yeah, it's #42440 (so 1.8) and later.

(We might be able to find some ranges of commits that TSAN could be built but I think it'd require some serious archaeologies.)

@tkf
Copy link
Member Author

tkf commented Oct 4, 2021

I'm getting this error https://buildkite.com/julialang/julia-master/builds/4307#a956be25-a524-450d-b6f2-3c71b7cfb920/167-238

make[1]: Entering directory '/cache/build/amdci7-1/julialang/julia-master/src/support'
# potential confusion with debugging tools, when inspecting a process that has both `libjulia` and `libjulia-internal`
make -C /cache/build/amdci7-1/julialang/julia-master/src/support debug BUILDDIR='/cache/build/amdci7-1/julialang/julia-master/tmp/test-tsan/tsan/src/support'
make: *** No rule to make target '/cache/build/amdci7-1/julialang/julia-master/tmp/test-tsan/tsan/usr/lib/libuv.a', needed by '/cache/build/amdci7-1/julialang/julia-master/tmp/test-tsan/tsan/usr/lib/libjulia-internal-debug.so.1.8'.  Stop.

The error No rule to make target '/.../usr/lib/libuv.a, is a bit puzzling. I wounder if it's due to that I'm using make -C src on an out-of-source build?

@DilumAluthge
Copy link
Member

Looks like you fixed it?

@tkf
Copy link
Member Author

tkf commented Oct 4, 2021

Yeah, it looks like julia-src-debug on the build root did the trick. (I'm guessing we need to pass around some states from the root Makefile to src/Makefile.)

@DilumAluthge
Copy link
Member

(I'm guessing we need to pass around some states from the root Makefile to src/Makefile.)

Do you want to make those fixes in this PR, or a separate PR?

@tkf
Copy link
Member Author

tkf commented Oct 4, 2021

Oh, I was guessing the reason why make julia-src-debug worked in the out-of-tree build while make -C src debug broke something. AFAICT julia-src-debug just wraps make -C src debug

julia/Makefile

Lines 75 to 76 in 5a1b8ce

julia-src-release julia-src-debug : julia-src-% : julia-deps julia_flisp.boot.inc.phony julia-cli-%
@$(MAKE) $(QUIET_MAKE) -C $(BUILDROOT)/src $*

I think this PR is good to go, unless you want to double-check julia-src-debug is the correct make target.

@DilumAluthge
Copy link
Member

unless you want to double-check julia-src-debug is the correct make target

@Keno or @vchuravy Can you confirm that this is the correct target to be building?

Once they've confirmed that, let's merge this.

@vchuravy
Copy link
Member

vchuravy commented Oct 4, 2021

Since we are setting override JULIA_BUILD_MODE=debug in Make.user yes that should be right.

@DilumAluthge DilumAluthge merged commit f985b47 into master Oct 4, 2021
@DilumAluthge DilumAluthge deleted the tkf/tsan-ci branch October 4, 2021 19:07
@KristofferC KristofferC added the backport 1.6 Change should be backported to release-1.6 label Oct 29, 2021
KristofferC pushed a commit that referenced this pull request Oct 29, 2021
KristofferC pushed a commit that referenced this pull request Oct 31, 2021
KristofferC pushed a commit that referenced this pull request Nov 11, 2021
KristofferC pushed a commit that referenced this pull request Nov 12, 2021
@KristofferC KristofferC removed backport 1.6 Change should be backported to release-1.6 backport 1.7 labels Nov 13, 2021
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
staticfloat pushed a commit that referenced this pull request Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants