Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added wildcard feature to .bazelignore #13756

Closed
wants to merge 2 commits into from

Conversation

David-Lam
Copy link

This pr add wildcard features to .bazelignore. It aims to fix Issue:#7093. The supported wildcard syntax is same as those in UnixGlob (?, *, and **).

Changes
I modified IgnoredPackagePrefixesValue so that it will evaluate ignored path prefixes every time new package is look up. It done that by globing all the directories under the package (including the package) and add the directories with path matches an ignored pattern to the set of ignored prefixes. The globing will not continue if the current directory is ignored (early stop). The globing logic is similar to scanDirectoryRecursively in BinTool expect it use an iterative approach instead of recursion.

The changes are implemented in the IgnoredPackagePrefixesValue rather than IgnoredPackagePrefixesFunction because IgnoredPackagePrefixesFunction will not re-evaluate ignored prefixes when new directories are added, which cause them not being ignored even they match a wildcard pattern in .bazelignore.

I also added 2 tests to bazelignore_test.sh. The test_new_directory_is_ignored check will newly added directories be ignored and test_does_not_glob_into_ignored_directory_with_wildcard check can wildcard entry successfully ignore directories.

Impact
The following are metrics by running the command in a large android repository with around 560 Gradle modules and we want to exclude all build folder for our Bazel build. Here is the performance metrics before and after this change.

bazel shutdown
time bazel query //src:main

After changes
Without **/build (no modification in .bazelignore):

bazel query //src:main 0.82s user 4.58s system 33% cpu 16.289 total
bazel query //src:main  0.68s user 1.29s system 21% cpu 9.149 total
bazel query //src:main  0.70s user 1.38s system 21% cpu 9.819 total

With **/build (replace all prefixes ending with /build (around 920 prefixes) to **/build in .bazelignore):

bazel query //src:main  0.74s user 1.28s system 22% cpu 9.121 total
bazel query //src:main  0.74s user 1.30s system 24% cpu 8.220 total
bazel query //src:main  0.73s user 1.26s system 23% cpu 8.457 total

Original

bazel query //src:main  0.66s user 1.18s system 28% cpu 6.390 total
bazel query //src:main  0.68s user 1.20s system 27% cpu 6.844 total
bazel query //src:main  0.66s user 1.18s system 28% cpu 6.583 total

reset changes

implemented lazy wildcard matching

changed to use custom globbing function because UnixGlob Builder take a long time for some reason

clean up code

added logs and comment

fixed comment and log

clean up code

remove Eventhandler because it will throw error

changed Symlink and use concurrent hash set

added early stop
@google-cla
Copy link

google-cla bot commented Jul 27, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added the cla: no label Jul 27, 2021
@google-cla
Copy link

google-cla bot commented Jul 27, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@David-Lam
Copy link
Author

@googlebot I signed it!

Copy link
Contributor

@janakdr janakdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often does your repo's set of ignored directories change? Would it be very hard to generate the .bazelignore file from a shell and check it in, updating it from time to time?

@@ -313,7 +313,7 @@ private PackageLookupValue getPackageLookupValue(
private static boolean isPackageIgnored(
PackageIdentifier id, IgnoredPackagePrefixesValue ignoredPatternsValue) {
PathFragment packageFragment = id.getPackageFragment();
for (PathFragment pattern : ignoredPatternsValue.getPatterns()) {
for (PathFragment pattern : ignoredPatternsValue.getPatterns(packageFragment)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just provide a method, IgnoredPackagePrefixesValue#isPackageIgnored(PathFragment packageFragment), that can answer this question? That seems much more straightforward than recursively reading directories. You would just need to check the path fragment against starting with any of the ignored paths and matching any of the globs (with an implicit "/*" at the end of each glob). That seems significantly cheaper and also simpler.

That should also remove any need for caching, so IgnoredPackagePrefixesValue will stay immutable. Without that, I don't think this PR can advance: mutable SkyValues are very undesirable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to review!
From my understanding, Bazel pass the set of ignored directories around different components in some case. In this case they consume it immediately. Just checking the path fragment against starting with any of the ignored paths and matching any of the globs only work in this application but not all. I place the logic here because PackageLookupFunction seems to be the first place to check ignored directories from .bazelignore.

For example, we have a directory structure like the following:

/package
├── dir1
└── dir2
    ├── subDir1
    └── subDir2

and our .bazelignore is:

**/subDir*

Then only checking is the pathFragment (in this case /root) ignored will be not ignore subDir1 and subDir2, we have to recursively read into the directories to check subDir1 and subDir2 against the glob.

I also place the logic in SkyValue instead of SkyFunction because IgnoredPackagePrefixesFunction will not re-evaluate ignored prefixes when new directories are added, which cause them not being ignored even they match a wildcard pattern in .bazelignore.

For example:

in .bazelignore:
a/c
a/d
---------------------
WORKSPACE
a/b/BUILD
a/c/BUILD

bzl query //... # should only have targets in b/BUILD

add a/d/BUILD
bzl query //... 

If we place the logic in IgnoredPackagePrefixesFunction, a/d/BUILD will not be ignored because IgnoredPackagePrefixesFunction won't recompute.

Do I answer the question or understand it wrong? Thanks again for reviewing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that my suggestion would address the code you modified. I agree that there are other parts of the code that would also need to be changed, but I actually think that you could use my suggestion for all of them, suitably modified.

In the case you give, my suggestion will work fine: only when subDir1 and subDir2 are visited will they be excluded. I don't see what the problem is with recursively reading into the directories to check subDir1 and subDir2. Your code also reads recursively into those directories, but it does so eagerly, even if subDir1 and subDir2 are never used in the build.

You won't need the mutable SkyValue if you don't do a directory listing, which is my suggestion.

Just to be explicit, the PR in its current form is not viable. The uncached system calls made on what should be a quick lookup, coupled with the mutable SkyValue, make this a no-go. Please try implementing my suggestion and creating a new PR. If it doesn't meet your needs, add a test case demonstrating the failure and we can discuss it there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @janakdr! I tried implement it in the way you suggested (#13807) but the wildcard test cases will failed. Could you please take a look and make sure I am on the right track? Thanks!

}
error = UnixGlob.checkPatternForError(line);
if (error != null) {
eventHandler.handle(Event.warn((error + " (in .bazelignore glob pattern '" + line + "')")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really ok to have errors here that don't fail the build?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to throw exception or INFO if thats better. Let me know what you think, thanks!

@nkoroste
Copy link
Contributor

@janakdr any updates on this? we would love to use this feature as it significantly improves the IDE experience in a repo that has both Gradle and Bazel since it allows you to ignore all auto generated Gradle build/ dirs

@janakdr
Copy link
Contributor

janakdr commented Sep 27, 2021

@nkoroste I commented on #13807 a while ago and haven't heard back. My impression was that we were pursuing that direction rather than this one.

@nkoroste
Copy link
Contributor

Thanks for the update, I wasn't aware of the other PR #13807 I will subscribe to it instead

@sgowroji
Copy link
Member

Hello @David-Lam, I was going through the recent comments from here. Can you please share the latest update on this PR. Thanks!

@sgowroji sgowroji added the awaiting-user-response Awaiting a response from the author label Apr 22, 2022
@David-Lam
Copy link
Author

Hello @David-Lam, I was going through the recent comments from here. Can you please share the latest update on this PR. Thanks!

Yes, I am currently trying to change all reference to the set of ignored prefixes the to a custom class. My progress is very slow since the ignored prefixes is passed to many places in the project. Feel free to work on the issue if you have new idea!

@nate-thirdwave
Copy link

Any luck on this?

@sgowroji sgowroji added the team-Core Skyframe, bazel query, BEP, options parsing, bazelrc label Oct 19, 2022
@nkoroste
Copy link
Contributor

nkoroste commented Nov 8, 2022

fyi - I think the newer version of this PR is here #13807

@keertk
Copy link
Member

keertk commented Dec 7, 2022

Hi there! Thank you for contributing to the Bazel repository. We appreciate your time and effort. We're doing a clean up of old PRs and will be closing this one since it seems to have stalled. Please feel free to reopen if you’re still interested in pursuing this or if you'd like to discuss anything further. We’ll respond as soon as we have the bandwidth/resources to do so.

@keertk keertk closed this Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-user-response Awaiting a response from the author cla: yes team-Core Skyframe, bazel query, BEP, options parsing, bazelrc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants