Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up class uniqueness analyses by caching results of reading jars #1837

Merged
merged 6 commits into from
Sep 7, 2021

Conversation

AlexLandau
Copy link
Contributor

@AlexLandau AlexLandau commented Jul 22, 2021

Before this PR

Running checkClassUniqueness could take a long time. In a sample build of [big monorepo], six projects took over a minute each for this to run, and the total runtime across all projects was 143 minutes.

After this PR

We cache the class names and corresponding hashes found in each jar in a shared Gradle service that lasts for the duration of the Gradle invocation. In [big monorepo], this reduces the total runtime to 8 minutes.

As an incidental additional fix, a regex intended to strip ".class" from the end of class names was .class instead of \.class$ and could also remove "class" from the middle of fully qualified class names. This has been fixed.

==COMMIT_MSG==
Increase the speed of the checkClassUniqueness task, especially in large repos, by adding caching of jar information.

Fix class names listed in the baseline-class-uniqueness.lock when the class or package name contains the substring class. In rare cases, this may require running ./gradlew checkClassUniqueness --write-locks to update the files.
==COMMIT_MSG==

Possible downsides?

The cache could increase memory use during the Gradle run, in proportion to the total dependency footprint of the repository.

@changelog-app
Copy link

changelog-app bot commented Jul 22, 2021

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Increase the speed of the checkClassUniqueness task, especially in large repos, by adding caching of jar information.

Fix class names listed in the baseline-class-uniqueness.lock when the class or package name contains class. In rare cases, this may require running ./gradlew checkClassUniqueness --write-locks to update the files.

Check the box to generate changelog(s)

  • Generate changelog entry

@AlexLandau AlexLandau force-pushed the alandau/cache-class-uniqueness-hashes branch 2 times, most recently from 3cad2d5 to 44ca021 Compare July 22, 2021 03:22
This will likely close to double the runtime performance of a single task
(sharing results between main and test classpaths) and should
significantly improve the performance of these tasks in monorepos.
@AlexLandau AlexLandau force-pushed the alandau/cache-class-uniqueness-hashes branch from 44ca021 to 8261c85 Compare July 22, 2021 03:24
Caffeine.newBuilder().build();

public static class Result {
private final ImmutableSetMultimap<String, HashCode> hashesByClassName;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had this as just an ImmutableMap, but it failed on a case where it tried to enter the same key/value pair twice for some reason. I didn't collect the jar name, but I can share the class name internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you find out why that is? It means that we iterate over the same class name and class hash twice?

If you can reproduce, it might be interesting to retrieve the jar and class names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sent you the details internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the issue is because the published jar is malformed and contains duplicate files. A problem with the publisher and not with gradle-baseline so no need to block this PR based on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a final thing, can we add a log when this happens? Ideally should hunt down these bad jars.

@AlexLandau AlexLandau marked this pull request as ready for review July 22, 2021 03:37
@policy-bot policy-bot bot requested a review from tpetracca July 22, 2021 03:37
Copy link
Contributor

@carterkozak carterkozak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature seems like a good candidate for Gradle artifact transformations which are cached by gradle, and shouldn't require us to implement the caching/reuse logic. What do you think?

@AlexLandau
Copy link
Contributor Author

This feature seems like a good candidate for Gradle artifact transformations which are cached by gradle, and shouldn't require us to implement the caching/reuse logic. What do you think?

That's an interesting feature, but it seems like it would be over-engineering to use it here. We'd have to add a new configuration for each requested configuration and resolve them to use the custom attribute, and we'd need serialization and deserialization of the results (each task would be reading the computed information back from disk).

There might be a more appropriate cache key than the ModuleVersionIdentifier, but we were already counting on that being unique in the existing logic.

@AlexLandau
Copy link
Contributor Author

@carterkozak Any follow-up here?

The completion of my thought above is that using this type of in-memory caching is well-insulated from unintended interactions with other plugins (while still being reusable if we want to), whereas adding new configurations and dependencies (in order to use artifact transformations) would be more of a public API change.

@carterkozak carterkozak requested review from fawind and CRogers August 19, 2021 21:32
@carterkozak
Copy link
Contributor

I've added + pinged folks who are more knowledgeable on gradle than myself, I'd feel more comfortable if they signed off.

@@ -60,43 +56,26 @@ public void analyzeConfiguration(Configuration configuration) {
Map<String, Set<ModuleVersionIdentifier>> classToJars = new HashMap<>();
Map<String, Set<HashCode>> tempClassToHashCodes = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we don't use a SetMultimap here as well? That way we also could get rid of the multiMapPut method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this refactor to the PR. The section that looks for singleton-set values requires an asMap(), but that's the only wrinkle.

TaskProvider<CheckClassUniquenessLockTask> checkClassUniqueness =
project.getTasks().register("checkClassUniqueness", CheckClassUniquenessLockTask.class);
Provider<JarClassHasher> jarClassHasher = project.getGradle()
.getSharedServices()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we use a BuildService here is to reuse the cache between tasks of different projects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The other alternative I'm aware of is putting an extension on the root project and locating the cache there, which should also work.

private final Cache<ModuleVersionIdentifier, Result> cache =
Caffeine.newBuilder().build();

public static class Result {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static class Result {
public static final class Result {

nit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@policy-bot policy-bot bot requested a review from carterkozak September 7, 2021 19:46
Copy link
Contributor

@carterkozak carterkozak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@bulldozer-bot bulldozer-bot bot merged commit f96d194 into develop Sep 7, 2021
@bulldozer-bot bulldozer-bot bot deleted the alandau/cache-class-uniqueness-hashes branch September 7, 2021 21:15
@svc-autorelease
Copy link
Collaborator

Released 4.24.0

This was referenced Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants