-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up class uniqueness analyses by caching results of reading jars #1837
Conversation
Generate changelog in
|
3cad2d5
to
44ca021
Compare
This will likely close to double the runtime performance of a single task (sharing results between main and test classpaths) and should significantly improve the performance of these tasks in monorepos.
44ca021
to
8261c85
Compare
Caffeine.newBuilder().build(); | ||
|
||
public static class Result { | ||
private final ImmutableSetMultimap<String, HashCode> hashesByClassName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally had this as just an ImmutableMap, but it failed on a case where it tried to enter the same key/value pair twice for some reason. I didn't collect the jar name, but I can share the class name internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you find out why that is? It means that we iterate over the same class name and class hash twice?
If you can reproduce, it might be interesting to retrieve the jar and class names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sent you the details internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the issue is because the published jar is malformed and contains duplicate files. A problem with the publisher and not with gradle-baseline so no need to block this PR based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a final thing, can we add a log when this happens? Ideally should hunt down these bad jars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature seems like a good candidate for Gradle artifact transformations which are cached by gradle, and shouldn't require us to implement the caching/reuse logic. What do you think?
That's an interesting feature, but it seems like it would be over-engineering to use it here. We'd have to add a new configuration for each requested configuration and resolve them to use the custom attribute, and we'd need serialization and deserialization of the results (each task would be reading the computed information back from disk). There might be a more appropriate cache key than the ModuleVersionIdentifier, but we were already counting on that being unique in the existing logic. |
@carterkozak Any follow-up here? The completion of my thought above is that using this type of in-memory caching is well-insulated from unintended interactions with other plugins (while still being reusable if we want to), whereas adding new configurations and dependencies (in order to use artifact transformations) would be more of a public API change. |
I've added + pinged folks who are more knowledgeable on gradle than myself, I'd feel more comfortable if they signed off. |
@@ -60,43 +56,26 @@ public void analyzeConfiguration(Configuration configuration) { | |||
Map<String, Set<ModuleVersionIdentifier>> classToJars = new HashMap<>(); | |||
Map<String, Set<HashCode>> tempClassToHashCodes = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we don't use a SetMultimap
here as well? That way we also could get rid of the multiMapPut
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this refactor to the PR. The section that looks for singleton-set values requires an asMap()
, but that's the only wrinkle.
TaskProvider<CheckClassUniquenessLockTask> checkClassUniqueness = | ||
project.getTasks().register("checkClassUniqueness", CheckClassUniquenessLockTask.class); | ||
Provider<JarClassHasher> jarClassHasher = project.getGradle() | ||
.getSharedServices() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we use a BuildService here is to reuse the cache between tasks of different projects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. The other alternative I'm aware of is putting an extension on the root project and locating the cache there, which should also work.
private final Cache<ModuleVersionIdentifier, Result> cache = | ||
Caffeine.newBuilder().build(); | ||
|
||
public static class Result { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static class Result { | |
public static final class Result { |
nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Released 4.24.0 |
Before this PR
Running
checkClassUniqueness
could take a long time. In a sample build of [big monorepo], six projects took over a minute each for this to run, and the total runtime across all projects was 143 minutes.After this PR
We cache the class names and corresponding hashes found in each jar in a shared Gradle service that lasts for the duration of the Gradle invocation. In [big monorepo], this reduces the total runtime to 8 minutes.
As an incidental additional fix, a regex intended to strip ".class" from the end of class names was
.class
instead of\.class$
and could also remove "class" from the middle of fully qualified class names. This has been fixed.==COMMIT_MSG==
Increase the speed of the
checkClassUniqueness
task, especially in large repos, by adding caching of jar information.Fix class names listed in the
baseline-class-uniqueness.lock
when the class or package name contains the substringclass
. In rare cases, this may require running./gradlew checkClassUniqueness --write-locks
to update the files.==COMMIT_MSG==
Possible downsides?
The cache could increase memory use during the Gradle run, in proportion to the total dependency footprint of the repository.