Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing asan_blacklist.txt when building via Bazel #3093

Closed
ipetr0v opened this issue Dec 6, 2019 · 40 comments
Closed

Missing asan_blacklist.txt when building via Bazel #3093

ipetr0v opened this issue Dec 6, 2019 · 40 comments

Comments

@ipetr0v
Copy link
Contributor

ipetr0v commented Dec 6, 2019

We have the following error in the fuzz build:
https://oss-fuzz-build-logs.storage.googleapis.com/index.html#oak

Step #4: ERROR: /builder/home/.cache/bazel/_bazel_root/872aac72dffae06b4e2a5b0308508d03/external/com_google_absl/absl/base/BUILD.bazel:171:1: undeclared inclusion(s) in rule '@com_google_absl//absl/base:base':
Step #4: this rule is missing dependency declarations for the following files included by 'external/com_google_absl/absl/base/internal/unscaledcycleclock.cc':
Step #4:   '/usr/local/lib/clang/10.0.0/share/asan_blacklist.txt'

And each time we run fuzzing - the library that causes the problem randomly changes.

I have found a flag, that causes this behavior:

$ bazel build --cxxopt=-fsanitize=address //oak/server:wasm_node
INFO: Analyzed target //oak/server:wasm_node (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /root/.cache/bazel/_bazel_root/872aac72dffae06b4e2a5b0308508d03/external/com_google_absl/absl/numeric/BUILD.bazel:27:1: undeclared inclusion(s) in rule '@com_google_absl//absl/numeric:int128':
this rule is missing dependency declarations for the following files included by 'external/com_google_absl/absl/numeric/int128.cc':
  '/usr/local/lib/clang/10.0.0/share/asan_blacklist.txt'
Target //oak/server:wasm_node failed to build

I get this error even after bazel clean --expunge.

It seems that the address sanitizer implicitly adds a dependency on /usr/local/lib/clang/10.0.0/share/asan_blacklist.txt in every compiled file (even external ones). And Bazel could not compile targets because all dependencies should be explicitly included in BUILD files.

Another project that has the same issue is gRPC:
https://oss-fuzz-build-logs.storage.googleapis.com/index.html#grpc
cc @tiziano88 @yang-g @Dor1s

More information on the issue: project-oak/oak#349

@ipetr0v
Copy link
Contributor Author

ipetr0v commented Dec 6, 2019

Also Envoy is failing with the same error too:

Step #4: ERROR: /builder/home/.cache/bazel/_bazel_root/4e9824db8e7d11820cfa25090ed4ed10/external/boringssl/BUILD:131:1: undeclared inclusion(s) in rule '@boringssl//:crypto':
Step #4: this rule is missing dependency declarations for the following files included by 'external/boringssl/src/crypto/dh/params.c':
Step #4:   '/usr/local/lib/clang/10.0.0/share/asan_blacklist.txt'

https://oss-fuzz-build-logs.storage.googleapis.com/index.html#envoy
cc @antoniovicente @asraa @htuch

@asraa
Copy link
Contributor

asraa commented Dec 6, 2019

Thanks for opening the issue! I also tried downgrading my bazel version and retrying the docker build with no success. I also tried checkout oss-fuzz before clang was pulled from git recently, and that didn't help. Did you try those as well?

@ipetr0v
Copy link
Contributor Author

ipetr0v commented Dec 6, 2019

I tried rebuilding the docker too.
I also tried to include

--cxxopt="-fsanitize-blacklist=oak/server/blacklist.txt" --conlyopt="-fsanitize-blacklist=oak/server/blacklist.txt"

with a local blacklist.txt file.
But this command does not rewrite the default blacklist, it just adds a new one. And also this doesn't solve the fact, that all other project and external files also depend on asan_blacklist.txt.

--strategy=local also doesn't help.

@Dor1s
Copy link
Contributor

Dor1s commented Dec 6, 2019

I've just read through https://github.com/llvm/llvm-project/blob/master/clang/lib/Driver/SanitizerArgs.cpp and I don't see a way not to use the default sanitizer blacklist (which makes sense, sanitizers might become useless without those).

So, looks like we need to find a solution in Bazel. Do you know any other project that uses the same version of Bazel and builds targets with ASan?

@Dor1s
Copy link
Contributor

Dor1s commented Dec 6, 2019

@asraa
Copy link
Contributor

asraa commented Dec 6, 2019

I did some searching through the github for projects using bazel and their OSS-Fuzz build logs failing with similar errors, and caught Envoy (us), Oak (Ivan), gRPC.

Tensorflow also uses bazel but I think has a different failure.

Weirdly enough, I modified our build script to use bazel 1.1.0 (instead of 1.2.0, and this is what we use upstream too) as well and still hit the failure. I'll try again.

@Dor1s
Copy link
Contributor

Dor1s commented Dec 6, 2019

Sanity check: did you re-build the project image after downgrading the bazel version?

@asraa
Copy link
Contributor

asraa commented Dec 6, 2019

Yep, I ran
python infra/helper.py build_image envoy; python infra/helper.py build_fuzzers envoy --clean

with bazel-1.1.0 build

@asraa
Copy link
Contributor

asraa commented Dec 6, 2019

Is everyone having trouble only with external bazel libraries?

I have gotten the missing asan_blacklist from external/boringssl, external/io_opencensus_cpp

@tiziano88
Copy link
Member

I don't think downgrading Bazel would be sufficient; Bazel ends up invoking clang installed on the base image (via che CC and CXX env variables), so we would have to downgrade clang or the image to revert to the old behaviour.

@Dor1s
Copy link
Contributor

Dor1s commented Dec 6, 2019

Oooh, so this started to occur exactly after we've landed #3060. I am sorry for the breakage then, we must be building something differently now. //cc @jonathanmetzman

@Dor1s
Copy link
Contributor

Dor1s commented Dec 6, 2019

I don't think the way how we build LLVM could affect this. @kcc @eugenis do you know if there were any upstream changes w.r.t to default sanitizer blacklists?

@tiziano88
Copy link
Member

@ipetr0v could you rename this issue to something that can be discovered and referenced more easily, e.g. "missing asan_blacklist.txt when building via bazel"?

@ipetr0v ipetr0v changed the title --cxxopt=-fsanitize=address in docker Missing asan_blacklist.txt when building via Bazel Dec 10, 2019
@tiziano88
Copy link
Member

My guess is that #3060 also caused a different version of clang to be built. In particular, I don't see how the particular version of clang gets pinned. It seems to me that whenever the docker image is rebuilt, the latest version will be pulled, which is usually not a good practice, as it will cause breakages when dependencies change.

@jonathanmetzman
Copy link
Contributor

My guess is that #3060 also caused a different version of clang to be built. In particular, I don't see how the particular version of clang gets pinned. It seems to me that whenever the docker image is rebuilt, the latest version will be pulled, which is usually not a good practice, as it will cause breakages when dependencies change.

We don't build the latest clang. We (with rare exceptions) build the version of clang used by Chromium. There is a team in Chromium dedicated towards ensuring that new versions of clang they uptake are stable. This approach balances stability with getting the latest features and bug fixes.

@jonathanmetzman
Copy link
Contributor

I don't think the way how we build LLVM could affect this. @kcc @eugenis do you know if there were any upstream changes w.r.t to default sanitizer blacklists?

Ping @kcc @eugenis

@kcc
Copy link
Contributor

kcc commented Dec 11, 2019

I am not aware of any changes in LLVM upstream.
More likely, something changed on the bazel side...

@tiziano88
Copy link
Member

Is the Bazel version not pinned?

@tiziano88
Copy link
Member

tiziano88 commented Dec 11, 2019

To answer my own question: we actually don't pin the Bazel version, at least in the oak project:

# Install Bazel.
RUN echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | tee /etc/apt/sources.list.d/bazel.list
RUN curl https://bazel.build/bazel-release.pub.gpg | apt-key add -
RUN apt-get update && apt-get install -y bazel

so it's possible in principle that something in Bazel has changed.

@ipetr0v could you see what it takes to pin our Docker image to a specific Bazel version? We could do something similar to asylo:

https://github.com/google/asylo/blob/088ea3490dd4579655bd5b65b0e31fe18de7f6dd/asylo/distrib/toolchain/Dockerfile#L48-L71

edit: actually for now we can probably just specify the version in this way: https://docs.bazel.build/versions/master/install-ubuntu.html#step-2-install-and-update-bazel . It is not as strong as the self-downloaded and hashed deb file, but it's sufficient for us.

@yang-g
Copy link
Contributor

yang-g commented Dec 11, 2019

FWIW the grpc project[1] uses a bazel version via a wrapper in grpc repo, which [2] pins the bazel version to 1.0.0. This pinning happened before we started to see this issue.

[1]

tools/bazel build \

[2]
https://github.com/grpc/grpc/blob/c9378dff8551ba863e607feaa49e03005ce2618b/tools/bazel#L35

@tiziano88
Copy link
Member

Do we agree that #3060 is what caused the issue though? It seems that before #3060 the script was downloading clang via SVN at revision 359254, and after #3060 it downloads it from git at commit e84b7a5fe230e42b8e6fe451369874a773bf1867, though there is some logic around it to control the actual version that gets downloaded, which I don't fully understand. FWIW it still seems to me that the pinned version changed, either in that script, or from the version that Chrome pins.

LLVM_SRC=$SRC/llvm-project
OUR_LLVM_REVISION=e84b7a5fe230e42b8e6fe451369874a773bf1867 # For manual bumping.
FORCE_OUR_REVISION=0 # To allow for manual downgrades.
LLVM_REVISION=$(grep -Po "CLANG_REVISION = '\K[a-f0-9]+(?=')" scripts/update.py)
clone_with_retries https://github.com/llvm/llvm-project.git $LLVM_SRC
set +e
IS_OUR_REVISION_ANCESTOR=$(git -C $LLVM_SRC merge-base --is-ancestor $OUR_LLVM_REVISION $LLVM_REVISION)
set -e
# Use our revision if specified or if our revision is a later revision than
# Chrome's.
if [ ! $IS_OUR_REVISION_ANCESTOR ] || [ $FORCE_OUR_REVISION ] ; then
LLVM_REVISION=$OUR_LLVM_REVISION
fi
git -C $LLVM_SRC checkout $LLVM_REVISION
echo "Using LLVM revision: $LLVM_REVISION"

@htuch
Copy link
Contributor

htuch commented Dec 11, 2019

FWIW, Envoy does pin Bazel version via Bazelisk, see https://github.com/envoyproxy/envoy/blob/master/.bazelversion

@asraa
Copy link
Contributor

asraa commented Dec 13, 2019

I don't think we do in our OSS-Fuzz though. We download the latest bazel right from apt in the Dockerfile, and then build with that.

I tried to revert before #3060 and it still didn't work, but I'll try again with a clean build and double check what version of clang i'm using. I'm not discounting it's something with rules_foreign_cc or something similar.

@jonathanmetzman
Copy link
Contributor

jonathanmetzman commented Dec 13, 2019

I don't think we do in our OSS-Fuzz though. We download the latest bazel right from apt in the Dockerfile, and then build with that.

I tried to revert before #3060 and it still didn't work, but I'll try again with a clean build and double check what version of clang i'm using. I'm not discounting it's something with rules_foreign_cc or something similar.

I don't think this will "just work". I think you would need to revert, and then build base-clang (docker build -t gcr.io/oss-fuzz-base/base-clang infra/base-images/base-clang+ grab a coffee while clang compiles) and the base-builder. I can see if I have time to try this today. What's a command I can do to determine if things work? Just build_project envoy?
(I'm going to try to pull the older docker images instead of building locally)

@asraa
Copy link
Contributor

asraa commented Dec 13, 2019

Yep, it didn't work again. I see. I've been running
python infra/helper.py build_image envoy; python infra/helper.py build_fuzzers envoy --clean
and it produces the error about missing asan_blacklist.txt reliably.

@jonathanmetzman
Copy link
Contributor

jonathanmetzman commented Dec 16, 2019

OK so I was able to confirm rolling back to the old clang allows envoy to build.
I'm not exactly sure how to solve this problem but here is my best guess:
The problem is isn't caused by changing clang's build to using git instead of svn.
Instead the problem was caused by me rolling clang. Clang has had a bunch of changes to its handling of blacklists https://github.com/llvm/llvm-project/commits/master/clang/lib/Driver/SanitizerArgs.cpp I suspect one of these introduced the breakage.

@jonathanmetzman
Copy link
Contributor

jonathanmetzman commented Dec 17, 2019

When I revert the patches in https://github.com/llvm/llvm-project/commits/master/clang/lib/Driver/SanitizerArgs.cpp and rebuild clang I get this error when trying to build envoy instead:

ERROR: /root/.cache/bazel/_bazel_root/4e9824db8e7d11820cfa25090ed4ed10/external/com_lightstep_tracer_cpp/BUILD.bazel:54:1: undeclared inclusion(s) in rule '@com_lightstep_tracer_cpp//:3rd_party_base64':
this rule is missing dependency declarations for the following files included by 'external/com_lightstep_tracer_cpp/3rd_party/base64/src/base64.cpp':
  '/src/llvm-build/include/c++/v1/cstdint'
  '/src/llvm-build/include/c++/v1/__config'
  '/src/llvm-build/include/c++/v1/stdint.h'
  '/src/llvm-build/include/c++/v1/string'
  '/src/llvm-build/include/c++/v1/string_view'
  '/src/llvm-build/include/c++/v1/__string'
  '/src/llvm-build/include/c++/v1/algorithm'
  '/src/llvm-build/include/c++/v1/initializer_list'
  '/src/llvm-build/include/c++/v1/cstddef'
  '/src/llvm-build/include/c++/v1/version'
  '/src/llvm-build/include/c++/v1/__nullptr'
  '/src/llvm-build/include/c++/v1/type_traits'
  '/src/llvm-build/include/c++/v1/cstring'
  '/src/llvm-build/include/c++/v1/string.h'
  '/src/llvm-build/include/c++/v1/stddef.h'
  '/src/llvm-build/include/c++/v1/utility'
  '/src/llvm-build/include/c++/v1/__tuple'
  '/src/llvm-build/include/c++/v1/__debug'
  '/src/llvm-build/include/c++/v1/iosfwd'
  '/src/llvm-build/include/c++/v1/wchar.h'
  '/src/llvm-build/include/c++/v1/stdio.h'
  '/src/llvm-build/include/c++/v1/memory'
  '/src/llvm-build/include/c++/v1/typeinfo'
  '/src/llvm-build/include/c++/v1/exception'
  '/src/llvm-build/include/c++/v1/cstdlib'
  '/src/llvm-build/include/c++/v1/stdlib.h'
  '/src/llvm-build/include/c++/v1/math.h'
  '/src/llvm-build/include/c++/v1/limits'
  '/src/llvm-build/include/c++/v1/__undef_macros'
  '/src/llvm-build/include/c++/v1/new'
  '/src/llvm-build/include/c++/v1/iterator'
  '/src/llvm-build/include/c++/v1/__functional_base'
  '/src/llvm-build/include/c++/v1/tuple'
  '/src/llvm-build/include/c++/v1/stdexcept'
  '/src/llvm-build/include/c++/v1/atomic'
  '/src/llvm-build/include/c++/v1/functional'
  '/src/llvm-build/include/c++/v1/bit'
  '/src/llvm-build/include/c++/v1/cstdio'
  '/src/llvm-build/include/c++/v1/cwchar'
  '/src/llvm-build/include/c++/v1/cwctype'
  '/src/llvm-build/include/c++/v1/cctype'
  '/src/llvm-build/include/c++/v1/ctype.h'
  '/src/llvm-build/include/c++/v1/wctype.h'

Frankly I don't understand enough about bazel to know what's going on here.

@jonathanmetzman
Copy link
Contributor

Okay I was able to get it building if I install clang rather than just using it from somewhere on disk.
So it looks like those patches broke bazel.

@jonathanmetzman
Copy link
Contributor

@kcc WDYT of reverting the patches?

I think we may need someone with knowledge of bazel to help here. @ipetr0v @asraa if you know anyone please let me know.

@asraa
Copy link
Contributor

asraa commented Dec 17, 2019

@dslomov we're having an issue with finding asan_blacklist.txt dependency when building with bazel. Seems like some patches in clang SanitizerArgs.cpp might have caused them. If there's any help from the bazel side, here's our tracking issue.

@Dor1s
Copy link
Contributor

Dor1s commented Dec 17, 2019

llvm/llvm-project@03b84e4#diff-e2fe4c5daa679eec2fe3bad655f14dc0 looks the most suspicious

@jonathanmetzman is it hard for you to revert that single patch and try again?

@jonathanmetzman
Copy link
Contributor

jonathanmetzman commented Dec 17, 2019

Comment from Pete Collingbourne, when i suggested reverting the patch

The change made it so that /usr/local/lib/clang/10.0.0/share/asan_blacklist.txt would appear in the depfile (-MF output) when passing -MD. Presumably bazel implements some sort of check that the files mentioned in the depfile are a subset of those in the rule's dependencies clause. Maybe the solution is to teach bazel that asan_blacklist.txt is a valid dependency of any asan compilation (or maybe all compilations, whichever is easier).

So I think this needs to be bazel folks that fix this. Let's file an issue in their repo if we don't see any movement on this bug.

@asraa
Copy link
Contributor

asraa commented Dec 18, 2019

Thank you so much for fixing the build for now! I'll pass along the info to Dmitry and anyone else I know at bazel, and see if they can help out.

@Dor1s
Copy link
Contributor

Dor1s commented Jan 17, 2020

This issue is now affecting some Chrome OS fuzzer as well: https://bugs.chromium.org/p/chromium/issues/detail?id=1042874

Does anyone want to take a stab at removing that default blacklist in LLVM upstream?

@asraa
Copy link
Contributor

asraa commented Jan 17, 2020

Rather than LLVM upstream, maybe the fix is in bazel? From this PR it looks like cc_configure needs to add the directory with the clang default blacklists to the builtin includes for cc_configure:
https://github.com/bazelbuild/bazel-toolchains/pull/795/files

@Dor1s
Copy link
Contributor

Dor1s commented Jan 18, 2020

If fix in bazel is possible, that it's 100% the best :)

@asraa
Copy link
Contributor

asraa commented Feb 25, 2020

Just to circle back on this, the fix is in bazel, and we tested it against the oss-fuzz envoy build and it succeeds now :)
bazelbuild/bazel@f32b0fd
I'm assuming this will be in some next major release of Bazel (it looks like release 3.0 will be cut 03-02). I can work on getting Envoy to bump bazel release ASAP when it's cut.

@asraa
Copy link
Contributor

asraa commented Feb 25, 2020

And a question -- would it be possible to generate a coverage report (locally, even) while we have this issue? Coverage reports fail because the clang versions differ, and I'm wondering if it's possible to change the sha's of the base-clang and base-runners to run coverage. Thank you!

@jonathanmetzman
Copy link
Contributor

And a question -- would it be possible to generate a coverage report (locally, even) while we have this issue? Coverage reports fail because the clang versions differ, and I'm wondering if it's possible to change the sha's of the base-clang and base-runners to run coverage. Thank you!

Sorry I don't really understand the question.
You can definitely remove that SHA fix I made whenever you want (but I wouldn't do it before the build passes).
If youre saying the clang version is behind the runner and so coverage reports fail, then I imagine picking a SHA from around the same time (that you can find here: https://pantheon.corp.google.com/gcr/images/oss-fuzz-base/GLOBAL/base-runner?gcrImageListsize=30) and using it for the runner (wherever the scripts do so) locally would work.

@asraa
Copy link
Contributor

asraa commented Apr 6, 2020

Cross-FYI bazel 3.0 is released and contains the fix. https://github.com/bazelbuild/bazel/releases/tag/3.0.0
I'm updating upstream project first so I don't revert the clang version workaround and break our build.

nareddyt added a commit to nareddyt/oss-fuzz that referenced this issue Apr 13, 2020
No need for workaround since ESPv2 was updated to Bazel 3.0.0

Ref: google#3093
Signed-off-by: Teju Nareddy <nareddyt@google.com>
inferno-chromium pushed a commit that referenced this issue Apr 13, 2020
No need for workaround since ESPv2 was updated to Bazel 3.0.0

Ref: #3093
Signed-off-by: Teju Nareddy <nareddyt@google.com>
GMNGeoffrey added a commit to iree-org/iree that referenced this issue Jul 14, 2020
Building with RBE will require upgrading to Bazel 3.0+ because of google/oss-fuzz#3093. TF is already on 3.1.0, so we should probably upgrade anyway.

Tested:
`bazel test --config=rs --config=asan //iree/...`
Two vulkan failures:
https://source.cloud.google.com/results/invocations/fa1c4e13-305e-4467-aac5-97773a0ecf57

Reverted the type registration added to fix a leak in dde06c0 and confirmed an asan failure
`bazel test --config=rs --config=asan //iree/vm:list_test`
https://source.cloud.google.com/results/invocations/7dfc3bcc-b837-4139-9649-f4156dd782bf
copybara-service bot pushed a commit to google/tsl that referenced this issue Dec 8, 2022
As discussed here (google/oss-fuzz#3093), clang was updated to include the sanitizer ignorelist within the `.D`. This causes bazel to look for those files within the dependency list, so this directory has to be added to the default include path for bazel to not complain while running sanitizers with clang.

PiperOrigin-RevId: 493939307
copybara-service bot pushed a commit to openxla/xla that referenced this issue Dec 8, 2022
As discussed here (google/oss-fuzz#3093), clang was updated to include the sanitizer ignorelist within the `.D`. This causes bazel to look for those files within the dependency list, so this directory has to be added to the default include path for bazel to not complain while running sanitizers with clang.

PiperOrigin-RevId: 493939307
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Dec 8, 2022
As discussed here (google/oss-fuzz#3093), clang was updated to include the sanitizer ignorelist within the `.D`. This causes bazel to look for those files within the dependency list, so this directory has to be added to the default include path for bazel to not complain while running sanitizers with clang.

PiperOrigin-RevId: 493939307
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants