Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download BazelRegistryJson only once per registry #19292

Closed
wants to merge 1 commit into from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Aug 21, 2023

By caching BazelRegistryJson in IndexRegistry and caching IndexRegistry instances per registry URL, bazel_registry.json is only downloaded once per registry instead of once for each module in the final dependency graph in computeFinalDepGraph.

On my local machine, this shaves 4s off of the time spent on module resolution for Bazel itself.

@fmeum fmeum marked this pull request as ready for review August 21, 2023 18:48
@github-actions github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. labels Aug 21, 2023
By caching `BazelRegistryJson` in `IndexRegistry` and caching
`IndexRegistry` instances per registry URL, `bazel_registry.json` is
only downloaded once per registry instead of once for each module in the
final dependency graph in `computeFinalDepGraph`.

On my local machine, this shaves 4s off of the time spent on module
resolution for Bazel itself.
@fmeum fmeum force-pushed the bzlmod-cache-registry-info branch from d4346e6 to 8fe04f3 Compare August 21, 2023 19:34
private Optional<BazelRegistryJson> getBazelRegistryJson(ExtendedEventHandler eventHandler)
throws IOException, InterruptedException {
if (bazelRegistryJson == null) {
synchronized (this) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the synchronization is necessary here? at worst we'll just fetch it again.

I'm usually wary of synchronized (this) because if someone else does synchronized (yourObject), you're suddenly potentially deadlocked.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, although every time this is fetched again results in a slower overall fetch. This currently doesn't matter since all get he's are sequential (see the profile screenshot in the other PR), but might become relevant when we change that. Not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, after looking at the profile graph you posted in #19291 (comment), it looks like we might be trying to fetch the registry json file with many threads at around the same time. So synchronizing might be faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I read that graph wrong. The MODULE.bazel files are fetched mostly concurrently, right? It's just all the "repo spec" fetches that are sequential. But why are there two "download file:" blocks in each MODULE.bazel fetch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's for reading yanked info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah the second download is for the metadata.json to fetch the yanked version. There's a bit of room for improvement there (though not much) since we'd be fetching the same metadata.json file for all versions of the same module.

The much bigger thing is the repo spec fetching, which used to be lazy until we introduced the lockfile. We should definitely parallelize those. Exactly how, I'm not sure yet (we could use the skyframe threads, or just create a separate thread pool maybe)

@Wyverald
Copy link
Member

wow, thanks for catching this!

private Optional<BazelRegistryJson> getBazelRegistryJson(ExtendedEventHandler eventHandler)
throws IOException, InterruptedException {
if (bazelRegistryJson == null) {
synchronized (this) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, although every time this is fetched again results in a slower overall fetch. This currently doesn't matter since all get he's are sequential (see the profile screenshot in the other PR), but might become relevant when we change that. Not sure.

@Wyverald Wyverald added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Aug 21, 2023
@github-actions github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Aug 22, 2023
@fmeum fmeum deleted the bzlmod-cache-registry-info branch August 22, 2023 05:19
@fmeum
Copy link
Collaborator Author

fmeum commented Aug 22, 2023

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Aug 22, 2023
@iancha1992
Copy link
Member

@bazel-io fork 6.4.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Aug 22, 2023
bazel-io pushed a commit to bazel-io/bazel that referenced this pull request Aug 22, 2023
By caching `BazelRegistryJson` in `IndexRegistry` and caching `IndexRegistry` instances per registry URL, `bazel_registry.json` is only downloaded once per registry instead of once for each module in the final dependency graph in `computeFinalDepGraph`.

On my local machine, this shaves 4s off of the time spent on module resolution for Bazel itself.

Closes bazelbuild#19292.

PiperOrigin-RevId: 558940780
Change-Id: I89b03a4c246b10f39b89a79852c922a6504f00bf
iancha1992 pushed a commit that referenced this pull request Aug 22, 2023
By caching `BazelRegistryJson` in `IndexRegistry` and caching
`IndexRegistry` instances per registry URL, `bazel_registry.json` is
only downloaded once per registry instead of once for each module in the
final dependency graph in `computeFinalDepGraph`.

On my local machine, this shaves 4s off of the time spent on module
resolution for Bazel itself.

Closes #19292.

Commit
8337dd7

PiperOrigin-RevId: 558940780
Change-Id: I89b03a4c246b10f39b89a79852c922a6504f00bf

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
@iancha1992
Copy link
Member

The changes in this PR have been included in Bazel 6.4.0 RC1. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=last_rc.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants