-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed caching: build fails when build inputs include directories #1415
Comments
Also cc @philwo |
I think this is "working as intended" - Bazel is not designed to handle directories as input artifacts and much of the code assumes that artifacts are files that can be read, checksummed, transferred, ... I agree it would be cool if we could support directories as first-class objects, though. The general workaround is to use the contents of the directories as individual files - does that work for you? e.g. by replacing this: |
No, I can't do that, because of issue #374, under which there are restrictions on the characters that are allowed in input file names. For example, this fails:
The only workaround is to use http://www.bazel.io/docs/build-ref.html
|
As a workaround, you could replace the glob with something like this:
|
@philwo, any more information we need here? |
@dslomov No, this is a known issue, we know it's important and what needs to be done, but AFAIK unfortunately no one is actively working on this, because everyone is busy with other stuff :( I think we should create a P1 bug / feature request about "relax label syntax validation" (if there isn't one already) and then mark this as a duplicate of it. |
philwo: Do you mean it's a known issue with directories or label syntax? Is there a tracking issue for better directories support? |
@hhclam Sorry, I meant it's a known issue that the label syntax is too strict. In a way, it's also a known issue that our support for directories in general is bad - I wouldn't be surprised if sandboxing breaks, remote caching breaks (because how Bazel has no digest for directories), remote execution breaks (because how would you transfer a directory in the protocol?), ... I remember someone on our team (sorry - I initially pinged the wrong Cory here :)) once making a proposal for how we could improve support for directories, but I'm not sure where that went. I personally think that it would be cool if we had better support for directories, but I don't see it happening. It's more probable that we eventually fix the label syntax, so that Bazel can address individual files and the need for using directories goes away. Pinging @ulfjack for a more senior perspective on this whole topic. |
So it sounds like despite the issue with label syntax validation, directories should be better supported with remote caching and execution. This is a valid bug that I should fix. |
@damienmg just changed labels to allow ' ', '(', ')' and '$'. Maybe we can add '&' to that list as well? Other than that, I'm still not sure what to do about directory inputs. RemoteSpawnStrategy certainly does not check whether input files are directories and does not upload directories. This is good for correctness - we can't track directories at that level. If we want to support directories, we'd need to expand them at a higher level conceptually. |
'&' is not problematic for any case (except for unquoted shell), we can add it right away. |
I'm actually not sure what the current state is wrt. directories, since I made some changes recently. I think we're throwing an error if there's a directory input, or maybe it's a warning? |
In either case, I've prepared a change to allow even more characters in labels, which will hopefully reduce the need to have directory inputs. |
If I add a directory to data of a test, I get a different error:
|
This has been fixed in the meantime. |
@buchgr to clarify - what has been fixed? The label issue or the original issue, i.e. making directory inputs work with distributed builds. I'm having a hard time finding this in the commit history. |
@mboes making directory inputs work with distributed builds. Are you seeing something else?
or filegroups including a directory instead of globbing it. There are early plans of properly supporting this in Bazel by introducing a "dir/" syntax to declare that a genrule output is a directory
|
@buchgr Has there been movement on this plan, or is the best way to go to declare a custom rule that uses declare_directory? (found this due to |
https://docs.aspect.build/rules/aspect_bazel_lib/docs/run_binary/ |
Listing only directories as outputs of the kernel tree rule causes the "dependency checking of directories is unsound" warning and can fail under some scenarios. One example is when the tree is used as an input of a genrule. See bazelbuild/bazel#1415 (comment) as well as https://bazel.build/reference/be/general#filegroup We can explicitly specify all the files in the kernel tree because it's a repository rule.
Consider this repository https://github.com/dfabulich/hazelcast-input-dir
In its
WORKSPACE
, it declares anew_http_archive
whose output includes directories:Its
BUILD
file includes a simplegenrule
that depends on those files.Finally, its
tools/bazel.rc
uses a local hazelcast node.When you
bazel clean && bazel build :x
the build fails:Whereas if you
bazel clean && bazel build --genrule_strategy= :x
to disable distributed caching, the build succeeds with a warning.The text was updated successfully, but these errors were encountered: