-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some DexShardsToMerge
/DexMerger
outputs do not get downloaded when using BwtB
#17603
Comments
This is probably related to #16333. I'm working on a PR to handle partial trees correctly in the prefetcher; expect it to be submitted within a few days. |
@tjgq this seems to fix the issue. I didn't test the head, but your patch on top of |
Thanks for confirming! I think it's too late to get this into 6.1, but I'll cherry-pick it into 6.2. |
@tjgq with the patch we're getting errors on
I modified the binary to print one of the stack traces:
I see there's a TODO on the bazel/src/main/java/com/google/devtools/build/lib/remote/AbstractActionInputPrefetcher.java Lines 531 to 534 in 4e739d3
Is that the issue we're seeing and is there a way to work around it? The _aar/unzipped/ directories are tree artifacts created by |
@lukaciko The TODO only applies if there are nested artifacts (a declare_file or a declare_directory inside another declare_directory). Does your build succeed with the If the build succeeds with |
(To clarify: if there are nested artifacts, that flag will produce an error in the analysis phase, before you get the execution phase error reported above.) |
I'm also wondering why the reported error is |
I apologise, the stack trace was from another error probably caused by me messing with the output base. This should be the correct stack trace:
Building with We see this error rather consistently in CI, but it's harder to reproduce locally. |
Can you figure out what kind of action creates the tree artifact (something like Also, does your CI build incrementally? (i.e., does it start with an empty output tree, or does it reuse the output tree from a previous build)? |
Still seeing the issue. Our CI builds incrementally - we have a output base that's reused from previous builds. We do not use This is the action creating the tree artifact:
|
It is certainly our intention that incremental builds should never crash, regardless of the preexisting state of the output tree, and including when you upgrade Bazel to a new version. My best guess is that the It sounds like this happens pretty consistently - how hard would it be to provide a minimal repro? I've tried to reproduce it with another Android project, but no luck so far. Alternatively, I could try to give you a patch to augment the stack trace with additional information about the state of the filesystem and other concurrent prefetches, so we can better understand the conditions under which this happens. |
Great to hear! Yes, this fix will be in 6.2. FYI, I suspect there's one remaining correctness issue in the presence of "templated tree artifacts" (in an Android context, that only applies to dex merging) + BwtB, as hinted at by this comment. I don't know how easy it is to trigger in practice, but I'm also planning to fix it for 6.2, regardless. Just wanted to mention it in case you come across some other weirdness in the future. |
Description of the bug:
When using Bazel 6.0 or later to build an
android_binary
(with D8, not definingdex_shards
), a remote cache and setting--remote_download_toplevel
the builds usually fail on theMergeDexZips
action with errors like:Looking at output files and timing profiles this seems to be because not all files of the
dexfiles
tree artifact are created, either by being downloaded from the remote cache by running the action locally.This tree artifact is created by the
DexShardsToMerge
, which usesSpawnActionTemplate
to createDexMerger
actions. We can see this usingaquery
:It looks like this might happen when some files in
dexfiles
tree artifact get a remote cache hit and others do not. The ones that do not get a hit get created locally, the ones that do get a hit never get downloaded.What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
No response
Which operating system are you running Bazel on?
MacOS 13.2.1 & Ubuntu Focal Fossa
What is the output of
bazel info release
?release 6.0.0-dev
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.This is a our patched version of the 6.0.0 release binary. We apply a few patches to improve performance of Android builds, fix dexing issues, (#16369 (comment)) and some other issues. We build it with:
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?No response
Have you found anything relevant by searching the web?
The last commit on the 6.0.0 release branch fixed a seemingly related issue: 86883db.
There is an ongoing work on BwtB #10880.
I have tried reverting that commit and creating the binary on top of recent master (5900d6d), but ran into the same issue.
Any other information, logs, or outputs that you want to share?
In the timing profile I see that
Remote.download
andRemote.downloadInMemoryOutput
get logged for the files that do not get created. There is a start time for these, but no wall duration.The text was updated successfully, but these errors were encountered: