Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace most Bzlmod events with a Skyframe graph lookup #22058

Closed
wants to merge 4 commits into from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Apr 18, 2024

Retaining Postables across the graph increases the memory usage of every dependent Skyframe node.

ModuleExtensionResolutionEvent is replaced with a lookup of the "done" Skyframe nodes for extension evaluation in BazelLockFileModule#afterCommand.

RootModuleFileFixupEvent is replaced with direct storage in SingleExtensionEvalValue. The validation of repos imported from an extension is moved into a separate SkyFunction. This simplifies bazel mod tidy, which no longer needs to swallow certain exceptions, and allows cache hits when only the (invalid) imports of an extension are modified even without the lockfile.

The BazelModuleResolutionEvent is kept for now, but will become obsolete with the new lockfile format.

Work towards #20369

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 18, 2024

@Wyverald There is one remaining test failure since we can't get the values of the extensions that fail due to invalid imports. I see two options here:

  1. Wrap SingleExtensionEvalValue into some other object that requires the requester to unwrap it, where the unwrapping performs the check for imports and can fail with an exception. Downside is that some SkyFunctions would now have to declare exception types.
  2. Introduce yet another SkyFunction as an intermediate that does the checking in a central place and can fail. Downside is yet another function/layer of indirection.

What do you think? Do you see an alternative?

@Wyverald
Copy link
Member

I see! I quite like option 2, actually. It means that even with --lockfile_mode=off, we might not rerun an extension if the only thing that changed was the imports.

@fmeum fmeum force-pushed the 20369-module-extension-walk branch from 8adb93b to 5eb3504 Compare April 20, 2024 15:27
@fmeum fmeum changed the title Replace ModuleExtensionResolutionEvent with a Skyframe graph lookup Replace most Bzlmod events with a Skyframe graph lookup Apr 20, 2024
@fmeum fmeum marked this pull request as ready for review April 20, 2024 15:29
@fmeum fmeum requested review from a team, Wyverald and meteorcloudy as code owners April 20, 2024 15:29
@fmeum fmeum requested review from aranguyen and removed request for a team and aranguyen April 20, 2024 15:29
@github-actions github-actions bot added team-Configurability platforms, toolchains, cquery, select(), config transitions team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Apr 20, 2024
@fmeum fmeum force-pushed the 20369-module-extension-walk branch from 928c399 to 0f1530f Compare April 21, 2024 17:14
@fmeum fmeum removed the team-Configurability platforms, toolchains, cquery, select(), config transitions label Apr 21, 2024
@fmeum fmeum force-pushed the 20369-module-extension-walk branch from 57614ed to 211e10b Compare April 21, 2024 22:22
@fmeum fmeum requested review from justinhorvitz and removed request for meteorcloudy April 22, 2024 12:56
@fmeum
Copy link
Collaborator Author

fmeum commented Apr 22, 2024

I ended up including the limiting of event propagation in this PR, which means that it should be possible to rely on regular Skyframe events for the download events introduced in #21901. I also added a test for this. @Wyverald PTAL.

@justinhorvitz Could you review the (very minor) Skyframe change? I wasn't sure how performance critical shouldPropagate is given that the new check will always return false in Blaze. It could be optimized further if needed.

@fmeum fmeum force-pushed the 20369-module-extension-walk branch 2 times, most recently from 55504b2 to a54bf17 Compare April 22, 2024 19:00
private static boolean isBzlmodKey(SkyKey key) {
// The return value of Class#getPackageName() and string literals are always interned.
// https://github.com/openjdk/jdk/blob/3e185c70feef3febf75c58a5d4d394a4b772105f/src/java.base/share/classes/java/lang/Class.java#L1258
return key.getClass().getPackageName() == "com.google.devtools.build.lib.bazel.bzlmod"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this smells really fishy but I have to admit I don't really understand the shouldPropagate discussion, so take this with a grain of salt; but having code logic couple so closely with code organization makes me very uncomfortable.

To confirm, is this needed for this PR? I was under the impression that the shouldPropagate stuff was related to the events generated during registry access that are sent to the Skyframe handlers. Do we also need it for some other purpose?

Copy link
Collaborator Author

@fmeum fmeum Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR doesn't touch BazelModuleResolutionEvent, the change to shouldPropagate is needed to address the memory regression that retaining Bzlmod events caused (every Skyframe node comes with a ValueWithMetadata wrapper). The future PRs will get rid of BazelModuleResolutionEvent, but for that they need to introduce the download events, so that by itself wouldn't improve the situation.

I agree that the idea of checking the package is fishy, but I found it to be the least bad option I could come up with, others being:

  1. Enumerate all Bzlmod SkyKey classes in some Set, but that would make membership testing slower and also require passing this Bazel/Bzlmod-specific logic into the executor.
  2. Have Bzlmod SkyKeys implement a common interface, but pre-JDK 23 that could have surprising performance implications and we would still need to get this Bazel-specific type into SkyframeExecutor.
  3. Do not use retained events in Bzlmod at all. We can get rid of BazelModuleResolutionEvent and try to build our own event mechanism for registry downloads. However, since we need to make sure that the events are correctly replayed on no-op builds with the lockfile removed, we would probably need to include them into the SkyValues as in my original approach for Store remote registry file hashes in the lockfile #21901. This would make it harder to decouple the registry download logic from the SkyFunctions using it.

No matter how brittle this looks, note that the relation defined by this check is lower and upper bounded by tests: If it matches too little or too many nodes, BazelModuleResolutionTest will fail due to analysis nodes having Bzlmod events attached or the main repo mapping node not having Bzlmod events attached.

I also don't love this approach, so I'm happy to change it out for anything else you prefer.

Edit: 2a) Have Bzlmod SkyKeys implement a common subclass could work without the risk of negative performance implications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer most of the alternatives to the package string check. A Set<SkyFunctionName>, marker interface, default boolean method on SkyKey. Final decision to @Wyverald since he has more stake in bzlmod.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change to shouldPropagate is needed to address the memory regression that retaining Bzlmod events caused (every Skyframe node comes with a ValueWithMetadata wrapper).

Is that regression introduced in this very PR, or some existing regression? If the latter, I'd say it's not so urgent as to require fixing right away. We could address it in a separate change, and have this PR focus solely on replacing (some) events with graph lookups.

3. Do not use retained events in Bzlmod at all. We can get rid of BazelModuleResolutionEvent and try to build our own event mechanism for registry downloads. However, since we need to make sure that the events are correctly replayed on no-op builds with the lockfile removed, we would probably need to include them into the SkyValues as in my original approach for Store remote registry file hashes in the lockfile #21901.

I'm still in favor of this approach. It's minimally intrusive, and easier to follow. Skyframe is a big scary beast and the fact that we're just now discovering some big memory regression is testament to that; so I'd rather we avoid using event propagation if we don't need to. Key part "if we don't need to": to me it feels like propagating events across nodes is definitely valuable in some cases, but for just recording registry access it feels overkill.

Simple proposal: add subscribe and unsubscribe methods to Registry. Before accessing the registry, subscribe; during, record accesses; after (and in finally), unsubscribe. All this can be done within one SkyFunction.

This would make it harder to decouple the registry download logic from the SkyFunctions using it.

I don't quite understand this part: what "coupling" are you referring to? And anyway, pretty much only two SkyFunctions need access to the registry -- ModuleFileFunction and RepoSpecFunction; it's not like it's a widely applicable concern.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the regression isn't new, I removed the Skyframe changes from this PR. @justinhorvitz You may still need to approve this to reset the "requesting changes" bit.

Skyframe is a big scary beast and the fact that we're just now discovering some big memory regression is testament to that; so I'd rather we avoid using event propagation if we don't need to.

I agree that Skyframe is scary, but at least it guarantees incremental correctness. I caused memory regressions in Bazel or Blaze in many other ways. Those in Blaze were all caught quickly thanks to the benchmarks - are there or could there be similar benchmarks for Bazel?

Simple proposal: add subscribe and unsubscribe methods to Registry. Before accessing the registry, subscribe; during, record accesses; after (and in finally), unsubscribe. All this can be done within one SkyFunction.

I'm not worried about how to get the information from the Registry into the SkyFunction using it, I'm more worried about us messing up incremental correctness of how this information then gets to the lockfile. But I agree that we can likely get this right:

  • Add the collected hashes to RepoSpecValue (need to bring this back) and ModuleFileValue.
  • Traverse the Skyframe graph down from RepositoryMappingValue.key(RepositoryMapping.MAIN) and collect the hashes from these values.

I think that we should walk edges instead of just iterating through all nodes for this as we don't want to record hashes of unused modules in the lockfile as this could increase the chance of merge conflicts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traverse the Skyframe graph down from RepositoryMappingValue.key(RepositoryMapping.MAIN) and collect the hashes from these values.

Or even just aggregate all the collected hashes in BazelDepGraphValue? Saves us the trouble of even having to perform graph lookups at all.

@fmeum fmeum force-pushed the 20369-module-extension-walk branch from 59b5bcf to ace82a1 Compare April 23, 2024 16:25
@fmeum fmeum requested a review from Wyverald April 23, 2024 16:41
@Wyverald Wyverald added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally awaiting-review PR is awaiting review from an assigned reviewer and removed awaiting-review PR is awaiting review from an assigned reviewer awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally labels Apr 24, 2024
@Wyverald
Copy link
Member

Actually, could you address the test failure? Looks legit.

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 24, 2024

Actually, could you address the test failure? Looks legit.

I pushed a fix and added a test I forgot to add earlier, PTAL.

@fmeum fmeum requested a review from Wyverald April 24, 2024 15:58
@Wyverald Wyverald added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Apr 24, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Apr 24, 2024

The CI failure is Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: Action result with digest "a5b6ab98b83c4a186291fc294ccb4ecc21f4fa9cb5dab4cdb7f2cb224847ae1b/371" was found, but got denied. I haven't seen this one before...

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 24, 2024

@bazel-io fork 7.2.0

@iancha1992
Copy link
Member

iancha1992 commented Apr 25, 2024

@fmeum can you please add a Javadoc comment for WithFactors for src/main/java/com/google/devtools/build/lib/bazel/bzlmod/LockFileModuleExtension.java?

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 25, 2024

@iancha1992 I pushed a new commit with the Javadoc.

@sgowroji sgowroji removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Apr 26, 2024
@fmeum fmeum deleted the 20369-module-extension-walk branch April 26, 2024 09:53
iancha1992 pushed a commit to iancha1992/bazel that referenced this pull request Apr 29, 2024
Retaining `Postable`s across the graph increases the memory usage of every dependent Skyframe node.

`ModuleExtensionResolutionEvent` is replaced with a lookup of the "done" Skyframe nodes for extension evaluation in `BazelLockFileModule#afterCommand`.

`RootModuleFileFixupEvent` is replaced with direct storage in `SingleExtensionEvalValue`. The validation of repos imported from an extension is moved into a separate `SkyFunction`. This simplifies `bazel mod tidy`, which no longer needs to swallow certain exceptions, and allows cache hits when only the (invalid) imports of an extension are modified even without the lockfile.

The `BazelModuleResolutionEvent` is kept for now, but will become obsolete with the new lockfile format.

Work towards bazelbuild#20369

Closes bazelbuild#22058.

PiperOrigin-RevId: 628213907
Change-Id: I8ba19f5151a8183be5051c8d9280f93476db2272
iancha1992 pushed a commit to iancha1992/bazel that referenced this pull request Apr 30, 2024
Retaining `Postable`s across the graph increases the memory usage of every dependent Skyframe node.

`ModuleExtensionResolutionEvent` is replaced with a lookup of the "done" Skyframe nodes for extension evaluation in `BazelLockFileModule#afterCommand`.

`RootModuleFileFixupEvent` is replaced with direct storage in `SingleExtensionEvalValue`. The validation of repos imported from an extension is moved into a separate `SkyFunction`. This simplifies `bazel mod tidy`, which no longer needs to swallow certain exceptions, and allows cache hits when only the (invalid) imports of an extension are modified even without the lockfile.

The `BazelModuleResolutionEvent` is kept for now, but will become obsolete with the new lockfile format.

Work towards bazelbuild#20369

Closes bazelbuild#22058.

PiperOrigin-RevId: 628213907
Change-Id: I8ba19f5151a8183be5051c8d9280f93476db2272
Wyverald pushed a commit that referenced this pull request May 6, 2024
Retaining `Postable`s across the graph increases the memory usage of every dependent Skyframe node.

`ModuleExtensionResolutionEvent` is replaced with a lookup of the "done" Skyframe nodes for extension evaluation in `BazelLockFileModule#afterCommand`.

`RootModuleFileFixupEvent` is replaced with direct storage in `SingleExtensionEvalValue`. The validation of repos imported from an extension is moved into a separate `SkyFunction`. This simplifies `bazel mod tidy`, which no longer needs to swallow certain exceptions, and allows cache hits when only the (invalid) imports of an extension are modified even without the lockfile.

The `BazelModuleResolutionEvent` is kept for now, but will become obsolete with the new lockfile format.

Work towards #20369

Closes #22058.

PiperOrigin-RevId: 628213907
Change-Id: I8ba19f5151a8183be5051c8d9280f93476db2272
github-merge-queue bot pushed a commit that referenced this pull request May 7, 2024
Retaining `Postable`s across the graph increases the memory usage of
every dependent Skyframe node.

`ModuleExtensionResolutionEvent` is replaced with a lookup of the "done"
Skyframe nodes for extension evaluation in
`BazelLockFileModule#afterCommand`.

`RootModuleFileFixupEvent` is replaced with direct storage in
`SingleExtensionEvalValue`. The validation of repos imported from an
extension is moved into a separate `SkyFunction`. This simplifies `bazel
mod tidy`, which no longer needs to swallow certain exceptions, and
allows cache hits when only the (invalid) imports of an extension are
modified even without the lockfile.

The `BazelModuleResolutionEvent` is kept for now, but will become
obsolete with the new lockfile format.

Work towards #20369

Closes #22058.

PiperOrigin-RevId: 628213907
Change-Id: I8ba19f5151a8183be5051c8d9280f93476db2272

---------

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
Kila2 pushed a commit to Kila2/bazel that referenced this pull request May 13, 2024
Retaining `Postable`s across the graph increases the memory usage of every dependent Skyframe node.

`ModuleExtensionResolutionEvent` is replaced with a lookup of the "done" Skyframe nodes for extension evaluation in `BazelLockFileModule#afterCommand`.

`RootModuleFileFixupEvent` is replaced with direct storage in `SingleExtensionEvalValue`. The validation of repos imported from an extension is moved into a separate `SkyFunction`. This simplifies `bazel mod tidy`, which no longer needs to swallow certain exceptions, and allows cache hits when only the (invalid) imports of an extension are modified even without the lockfile.

The `BazelModuleResolutionEvent` is kept for now, but will become obsolete with the new lockfile format.

Work towards bazelbuild#20369

Closes bazelbuild#22058.

PiperOrigin-RevId: 628213907
Change-Id: I8ba19f5151a8183be5051c8d9280f93476db2272
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants