-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up class uniqueness analyses by caching results of reading jars #1837
Changes from 3 commits
8261c85
00a24c8
b27b64c
8366a4a
d097af1
7e73be8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
type: improvement | ||
improvement: | ||
description: |- | ||
Increase the speed of the `checkClassUniqueness` task, especially in large repos, by adding caching of jar information. | ||
|
||
Fix class names listed in the `baseline-class-uniqueness.lock` when the class or package name contains the substring `class`. In rare cases, this may require running `./gradlew checkClassUniqueness --write-locks` to update the files. | ||
links: | ||
- https://github.com/palantir/gradle-baseline/pull/1837 |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,97 @@ | ||||||
/* | ||||||
* (c) Copyright 2021 Palantir Technologies Inc. All rights reserved. | ||||||
* | ||||||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||||||
* you may not use this file except in compliance with the License. | ||||||
* You may obtain a copy of the License at | ||||||
* | ||||||
* http://www.apache.org/licenses/LICENSE-2.0 | ||||||
* | ||||||
* Unless required by applicable law or agreed to in writing, software | ||||||
* distributed under the License is distributed on an "AS IS" BASIS, | ||||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||
* See the License for the specific language governing permissions and | ||||||
* limitations under the License. | ||||||
*/ | ||||||
|
||||||
package com.palantir.baseline.services; | ||||||
|
||||||
import com.github.benmanes.caffeine.cache.Cache; | ||||||
import com.github.benmanes.caffeine.cache.Caffeine; | ||||||
import com.google.common.collect.ImmutableSetMultimap; | ||||||
import com.google.common.hash.HashCode; | ||||||
import com.google.common.hash.Hashing; | ||||||
import com.google.common.hash.HashingInputStream; | ||||||
import com.google.common.io.ByteStreams; | ||||||
import java.io.File; | ||||||
import java.io.FileInputStream; | ||||||
import java.io.IOException; | ||||||
import java.util.jar.JarEntry; | ||||||
import java.util.jar.JarInputStream; | ||||||
import org.gradle.api.artifacts.ModuleVersionIdentifier; | ||||||
import org.gradle.api.artifacts.ResolvedArtifact; | ||||||
import org.gradle.api.services.BuildService; | ||||||
import org.gradle.api.services.BuildServiceParameters; | ||||||
|
||||||
public abstract class JarClassHasher implements BuildService<BuildServiceParameters.None>, AutoCloseable { | ||||||
private final Cache<ModuleVersionIdentifier, Result> cache = | ||||||
Caffeine.newBuilder().build(); | ||||||
|
||||||
public static class Result { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
nit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
private final ImmutableSetMultimap<String, HashCode> hashesByClassName; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I originally had this as just an ImmutableMap, but it failed on a case where it tried to enter the same key/value pair twice for some reason. I didn't collect the jar name, but I can share the class name internally. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you find out why that is? It means that we iterate over the same class name and class hash twice? If you can reproduce, it might be interesting to retrieve the jar and class names. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I sent you the details internally. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like the issue is because the published jar is malformed and contains duplicate files. A problem with the publisher and not with gradle-baseline so no need to block this PR based on that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a final thing, can we add a log when this happens? Ideally should hunt down these bad jars. |
||||||
|
||||||
private Result(ImmutableSetMultimap<String, HashCode> hashesByClassName) { | ||||||
this.hashesByClassName = hashesByClassName; | ||||||
} | ||||||
|
||||||
public ImmutableSetMultimap<String, HashCode> getHashesByClassName() { | ||||||
return hashesByClassName; | ||||||
} | ||||||
|
||||||
public static Result empty() { | ||||||
return new Result(ImmutableSetMultimap.of()); | ||||||
} | ||||||
} | ||||||
|
||||||
public final Result hashClasses(ResolvedArtifact resolvedArtifact) { | ||||||
return cache.get(resolvedArtifact.getModuleVersion().getId(), _moduleId -> { | ||||||
File file = resolvedArtifact.getFile(); | ||||||
if (!file.exists()) { | ||||||
return Result.empty(); | ||||||
} | ||||||
|
||||||
ImmutableSetMultimap.Builder<String, HashCode> hashesByClassName = ImmutableSetMultimap.builder(); | ||||||
try (FileInputStream fileInputStream = new FileInputStream(file); | ||||||
JarInputStream jarInputStream = new JarInputStream(fileInputStream)) { | ||||||
JarEntry entry; | ||||||
while ((entry = jarInputStream.getNextJarEntry()) != null) { | ||||||
if (entry.isDirectory() || !entry.getName().endsWith(".class")) { | ||||||
continue; | ||||||
} | ||||||
|
||||||
if (entry.getName().contains("module-info.class")) { | ||||||
// Java 9 allows jars to have a module-info.class file in the root, | ||||||
// we shouldn't complain about these. | ||||||
continue; | ||||||
} | ||||||
|
||||||
String className = entry.getName().replaceAll("/", ".").replaceAll("\\.class$", ""); | ||||||
HashingInputStream inputStream = new HashingInputStream(Hashing.sha256(), jarInputStream); | ||||||
ByteStreams.exhaust(inputStream); | ||||||
|
||||||
hashesByClassName.put(className, inputStream.hash()); | ||||||
} | ||||||
} catch (IOException e) { | ||||||
throw new RuntimeException(e); | ||||||
} | ||||||
return new Result(hashesByClassName.build()); | ||||||
}); | ||||||
} | ||||||
|
||||||
@Override | ||||||
public final void close() { | ||||||
// Try to free up memory when this is no longer needed | ||||||
cache.invalidateAll(); | ||||||
cache.cleanUp(); | ||||||
} | ||||||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,35 +18,31 @@ | |
|
||
import static java.util.stream.Collectors.toSet; | ||
|
||
import com.google.common.collect.ImmutableSetMultimap; | ||
import com.google.common.hash.HashCode; | ||
import com.google.common.hash.Hashing; | ||
import com.google.common.hash.HashingInputStream; | ||
import com.google.common.io.ByteStreams; | ||
import com.palantir.baseline.services.JarClassHasher; | ||
import java.io.File; | ||
import java.io.FileInputStream; | ||
import java.io.IOException; | ||
import java.time.Duration; | ||
import java.time.Instant; | ||
import java.util.Collection; | ||
import java.util.HashMap; | ||
import java.util.HashSet; | ||
import java.util.Map; | ||
import java.util.Set; | ||
import java.util.jar.JarEntry; | ||
import java.util.jar.JarInputStream; | ||
import java.util.stream.Collectors; | ||
import org.gradle.api.artifacts.Configuration; | ||
import org.gradle.api.artifacts.ModuleVersionIdentifier; | ||
import org.gradle.api.artifacts.ResolvedArtifact; | ||
import org.slf4j.Logger; | ||
|
||
public final class ClassUniquenessAnalyzer { | ||
|
||
private final JarClassHasher jarHasher; | ||
private final Map<Set<ModuleVersionIdentifier>, Set<String>> jarsToClasses = new HashMap<>(); | ||
private final Map<String, Set<HashCode>> classToHashCodes = new HashMap<>(); | ||
private final Logger log; | ||
|
||
public ClassUniquenessAnalyzer(Logger log) { | ||
public ClassUniquenessAnalyzer(JarClassHasher jarHasher, Logger log) { | ||
this.jarHasher = jarHasher; | ||
this.log = log; | ||
} | ||
|
||
|
@@ -60,43 +56,26 @@ public void analyzeConfiguration(Configuration configuration) { | |
Map<String, Set<ModuleVersionIdentifier>> classToJars = new HashMap<>(); | ||
Map<String, Set<HashCode>> tempClassToHashCodes = new HashMap<>(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any reason why we don't use a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added this refactor to the PR. The section that looks for singleton-set values requires an |
||
|
||
dependencies.stream().forEach(resolvedArtifact -> { | ||
for (ResolvedArtifact resolvedArtifact : dependencies) { | ||
File file = resolvedArtifact.getFile(); | ||
if (!file.exists()) { | ||
log.info("Skipping non-existent jar {}: {}", resolvedArtifact, file); | ||
return; | ||
} | ||
|
||
try (FileInputStream fileInputStream = new FileInputStream(file); | ||
JarInputStream jarInputStream = new JarInputStream(fileInputStream)) { | ||
JarEntry entry; | ||
while ((entry = jarInputStream.getNextJarEntry()) != null) { | ||
if (entry.isDirectory() || !entry.getName().endsWith(".class")) { | ||
continue; | ||
} | ||
|
||
if (entry.getName().contains("module-info.class")) { | ||
// Java 9 allows jars to have a module-info.class file in the root, | ||
// we shouldn't complain about these. | ||
continue; | ||
} | ||
|
||
String className = entry.getName().replaceAll("/", ".").replaceAll(".class", ""); | ||
HashingInputStream inputStream = new HashingInputStream(Hashing.sha256(), jarInputStream); | ||
ByteStreams.exhaust(inputStream); | ||
|
||
multiMapPut( | ||
classToJars, | ||
className, | ||
resolvedArtifact.getModuleVersion().getId()); | ||
|
||
multiMapPut(tempClassToHashCodes, className, inputStream.hash()); | ||
} | ||
} catch (IOException e) { | ||
log.error("Failed to read JarFile {}", resolvedArtifact, e); | ||
throw new RuntimeException(e); | ||
ImmutableSetMultimap<String, HashCode> hashes = | ||
jarHasher.hashClasses(resolvedArtifact).getHashesByClassName(); | ||
|
||
for (Map.Entry<String, HashCode> entry : hashes.entries()) { | ||
String className = entry.getKey(); | ||
HashCode hashValue = entry.getValue(); | ||
multiMapPut( | ||
classToJars, | ||
className, | ||
resolvedArtifact.getModuleVersion().getId()); | ||
multiMapPut(tempClassToHashCodes, className, hashValue); | ||
} | ||
}); | ||
} | ||
|
||
// discard all the classes that only come from one jar - these are completely safe! | ||
classToJars.entrySet().stream() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we use a BuildService here is to reuse the cache between tasks of different projects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. The other alternative I'm aware of is putting an extension on the root project and locating the cache there, which should also work.