-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download BazelRegistryJson
only once per registry
#19292
Conversation
By caching `BazelRegistryJson` in `IndexRegistry` and caching `IndexRegistry` instances per registry URL, `bazel_registry.json` is only downloaded once per registry instead of once for each module in the final dependency graph in `computeFinalDepGraph`. On my local machine, this shaves 4s off of the time spent on module resolution for Bazel itself.
d4346e6
to
8fe04f3
Compare
private Optional<BazelRegistryJson> getBazelRegistryJson(ExtendedEventHandler eventHandler) | ||
throws IOException, InterruptedException { | ||
if (bazelRegistryJson == null) { | ||
synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the synchronization is necessary here? at worst we'll just fetch it again.
I'm usually wary of synchronized (this)
because if someone else does synchronized (yourObject)
, you're suddenly potentially deadlocked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, although every time this is fetched again results in a slower overall fetch. This currently doesn't matter since all get he's are sequential (see the profile screenshot in the other PR), but might become relevant when we change that. Not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on second thought, after looking at the profile graph you posted in #19291 (comment), it looks like we might be trying to fetch the registry json file with many threads at around the same time. So synchronizing might be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I read that graph wrong. The MODULE.bazel files are fetched mostly concurrently, right? It's just all the "repo spec" fetches that are sequential. But why are there two "download file:" blocks in each MODULE.bazel fetch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's for reading yanked info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah the second download is for the metadata.json to fetch the yanked version. There's a bit of room for improvement there (though not much) since we'd be fetching the same metadata.json file for all versions of the same module.
The much bigger thing is the repo spec fetching, which used to be lazy until we introduced the lockfile. We should definitely parallelize those. Exactly how, I'm not sure yet (we could use the skyframe threads, or just create a separate thread pool maybe)
wow, thanks for catching this! |
src/main/java/com/google/devtools/build/lib/bazel/bzlmod/RegistryFactoryImpl.java
Show resolved
Hide resolved
private Optional<BazelRegistryJson> getBazelRegistryJson(ExtendedEventHandler eventHandler) | ||
throws IOException, InterruptedException { | ||
if (bazelRegistryJson == null) { | ||
synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, although every time this is fetched again results in a slower overall fetch. This currently doesn't matter since all get he's are sequential (see the profile screenshot in the other PR), but might become relevant when we change that. Not sure.
@bazel-io flag |
@bazel-io fork 6.4.0 |
By caching `BazelRegistryJson` in `IndexRegistry` and caching `IndexRegistry` instances per registry URL, `bazel_registry.json` is only downloaded once per registry instead of once for each module in the final dependency graph in `computeFinalDepGraph`. On my local machine, this shaves 4s off of the time spent on module resolution for Bazel itself. Closes bazelbuild#19292. PiperOrigin-RevId: 558940780 Change-Id: I89b03a4c246b10f39b89a79852c922a6504f00bf
By caching `BazelRegistryJson` in `IndexRegistry` and caching `IndexRegistry` instances per registry URL, `bazel_registry.json` is only downloaded once per registry instead of once for each module in the final dependency graph in `computeFinalDepGraph`. On my local machine, this shaves 4s off of the time spent on module resolution for Bazel itself. Closes #19292. Commit 8337dd7 PiperOrigin-RevId: 558940780 Change-Id: I89b03a4c246b10f39b89a79852c922a6504f00bf Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
The changes in this PR have been included in Bazel 6.4.0 RC1. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=last_rc. |
By caching
BazelRegistryJson
inIndexRegistry
and cachingIndexRegistry
instances per registry URL,bazel_registry.json
is only downloaded once per registry instead of once for each module in the final dependency graph incomputeFinalDepGraph
.On my local machine, this shaves 4s off of the time spent on module resolution for Bazel itself.