Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up class uniqueness analyses by caching results of reading jars #1837

Merged
merged 6 commits into from
Sep 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions changelog/@unreleased/pr-1837.v2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
type: improvement
improvement:
description: |-
Increase the speed of the `checkClassUniqueness` task, especially in large repos, by adding caching of jar information.

Fix class names listed in the `baseline-class-uniqueness.lock` when the class or package name contains the substring `class`. In rare cases, this may require running `./gradlew checkClassUniqueness --write-locks` to update the files.
links:
- https://github.com/palantir/gradle-baseline/pull/1837
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@

package com.palantir.baseline.plugins;

import com.palantir.baseline.services.JarClassHasher;
import com.palantir.baseline.tasks.CheckClassUniquenessLockTask;
import org.gradle.api.Project;
import org.gradle.api.artifacts.Configuration;
import org.gradle.api.plugins.JavaPlugin;
import org.gradle.api.provider.Provider;
import org.gradle.api.tasks.TaskProvider;
import org.gradle.language.base.plugins.LifecycleBasePlugin;

Expand All @@ -33,8 +35,14 @@
public class BaselineClassUniquenessPlugin extends AbstractBaselinePlugin {
@Override
public final void apply(Project project) {
TaskProvider<CheckClassUniquenessLockTask> checkClassUniqueness =
project.getTasks().register("checkClassUniqueness", CheckClassUniquenessLockTask.class);
Provider<JarClassHasher> jarClassHasher = project.getGradle()
.getSharedServices()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we use a BuildService here is to reuse the cache between tasks of different projects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The other alternative I'm aware of is putting an extension on the root project and locating the cache there, which should also work.

.registerIfAbsent("jarClassHasher", JarClassHasher.class, _spec -> {});
TaskProvider<CheckClassUniquenessLockTask> checkClassUniqueness = project.getTasks()
.register("checkClassUniqueness", CheckClassUniquenessLockTask.class, task -> {
task.jarClassHasher.set(jarClassHasher);
task.usesService(jarClassHasher);
});
project.getPlugins().apply(LifecycleBasePlugin.class);
project.getTasks().getByName(LifecycleBasePlugin.CHECK_TASK_NAME).dependsOn(checkClassUniqueness);

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
/*
* (c) Copyright 2021 Palantir Technologies Inc. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.palantir.baseline.services;

import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;
import com.google.common.collect.ImmutableListMultimap;
import com.google.common.collect.ImmutableSetMultimap;
import com.google.common.hash.HashCode;
import com.google.common.hash.Hashing;
import com.google.common.hash.HashingInputStream;
import com.google.common.io.ByteStreams;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.List;
import java.util.Map;
import java.util.jar.JarEntry;
import java.util.jar.JarInputStream;
import java.util.stream.Collectors;
import org.gradle.api.artifacts.ModuleVersionIdentifier;
import org.gradle.api.artifacts.ResolvedArtifact;
import org.gradle.api.services.BuildService;
import org.gradle.api.services.BuildServiceParameters;
import org.slf4j.Logger;

public abstract class JarClassHasher implements BuildService<BuildServiceParameters.None>, AutoCloseable {
private final Cache<ModuleVersionIdentifier, Result> cache =
Caffeine.newBuilder().build();

public static final class Result {
private final ImmutableSetMultimap<String, HashCode> hashesByClassName;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had this as just an ImmutableMap, but it failed on a case where it tried to enter the same key/value pair twice for some reason. I didn't collect the jar name, but I can share the class name internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you find out why that is? It means that we iterate over the same class name and class hash twice?

If you can reproduce, it might be interesting to retrieve the jar and class names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sent you the details internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the issue is because the published jar is malformed and contains duplicate files. A problem with the publisher and not with gradle-baseline so no need to block this PR based on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a final thing, can we add a log when this happens? Ideally should hunt down these bad jars.


private Result(ImmutableSetMultimap<String, HashCode> hashesByClassName) {
this.hashesByClassName = hashesByClassName;
}

public ImmutableSetMultimap<String, HashCode> getHashesByClassName() {
return hashesByClassName;
}

public static Result empty() {
return new Result(ImmutableSetMultimap.of());
}
}

public final Result hashClasses(ResolvedArtifact resolvedArtifact, Logger logger) {
return cache.get(resolvedArtifact.getModuleVersion().getId(), _moduleId -> {
File file = resolvedArtifact.getFile();
if (!file.exists()) {
return Result.empty();
}

ImmutableListMultimap.Builder<String, HashCode> hashesByClassName = ImmutableListMultimap.builder();
try (FileInputStream fileInputStream = new FileInputStream(file);
JarInputStream jarInputStream = new JarInputStream(fileInputStream)) {
JarEntry entry;
while ((entry = jarInputStream.getNextJarEntry()) != null) {
if (entry.isDirectory() || !entry.getName().endsWith(".class")) {
continue;
}

if (entry.getName().contains("module-info.class")) {
// Java 9 allows jars to have a module-info.class file in the root,
// we shouldn't complain about these.
continue;
}

String className = entry.getName().replaceAll("/", ".").replaceAll("\\.class$", "");
HashingInputStream inputStream = new HashingInputStream(Hashing.sha256(), jarInputStream);
ByteStreams.exhaust(inputStream);

hashesByClassName.put(className, inputStream.hash());
}
} catch (IOException e) {
throw new RuntimeException(e);
}

ImmutableListMultimap<String, HashCode> builtHashesByClassName = hashesByClassName.build();
List<String> keysWithDuplicateEntries = builtHashesByClassName.asMap().entrySet().stream()
.filter(entry -> entry.getValue().size() > 1)
.map(Map.Entry::getKey)
.sorted()
.collect(Collectors.toList());
if (!keysWithDuplicateEntries.isEmpty()) {
logger.warn(
"Warning: Gradle Baseline found a dependency jar that contains more than one zip entry for "
+ "a class and is likely malformed: {}\n"
+ "The following entries appear multiple times: {}\n"
+ "This issue should be reported to the maintainer of the dependency.",
resolvedArtifact.getModuleVersion(),
keysWithDuplicateEntries);
}

return new Result(ImmutableSetMultimap.copyOf(builtHashesByClassName));
});
}

@Override
public final void close() {
// Try to free up memory when this is no longer needed
cache.invalidateAll();
cache.cleanUp();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import com.google.common.base.Splitter;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableSortedMap;
import com.palantir.baseline.services.JarClassHasher;
import difflib.DiffUtils;
import difflib.Patch;
import java.io.File;
Expand All @@ -36,6 +37,7 @@
import org.gradle.api.Task;
import org.gradle.api.artifacts.Configuration;
import org.gradle.api.artifacts.ModuleVersionIdentifier;
import org.gradle.api.provider.Property;
import org.gradle.api.provider.SetProperty;
import org.gradle.api.specs.Spec;
import org.gradle.api.tasks.CacheableTask;
Expand All @@ -55,10 +57,14 @@ public class CheckClassUniquenessLockTask extends DefaultTask {
@SuppressWarnings("VisibilityModifier")
public final SetProperty<Configuration> configurations;

@SuppressWarnings("VisibilityModifier")
public final Property<JarClassHasher> jarClassHasher;

private final File lockFile;

public CheckClassUniquenessLockTask() {
this.configurations = getProject().getObjects().setProperty(Configuration.class);
this.jarClassHasher = getProject().getObjects().property(JarClassHasher.class);
this.lockFile = getProject().file("baseline-class-uniqueness.lock");
onlyIf(new Spec<Task>() {
@Override
Expand Down Expand Up @@ -91,8 +97,8 @@ public final void doIt() {
ImmutableSortedMap<String, Optional<String>> resultsByConfiguration = configurations.get().stream()
.collect(ImmutableSortedMap.toImmutableSortedMap(
Comparator.naturalOrder(), Configuration::getName, configuration -> {
ClassUniquenessAnalyzer analyzer =
new ClassUniquenessAnalyzer(getProject().getLogger());
ClassUniquenessAnalyzer analyzer = new ClassUniquenessAnalyzer(
jarClassHasher.get(), getProject().getLogger());
analyzer.analyzeConfiguration(configuration);
Collection<Set<ModuleVersionIdentifier>> problemJars = analyzer.getDifferingProblemJars();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,35 +18,32 @@

import static java.util.stream.Collectors.toSet;

import com.google.common.collect.HashMultimap;
import com.google.common.collect.ImmutableSet;
import com.google.common.collect.ImmutableSetMultimap;
import com.google.common.collect.SetMultimap;
import com.google.common.hash.HashCode;
import com.google.common.hash.Hashing;
import com.google.common.hash.HashingInputStream;
import com.google.common.io.ByteStreams;
import com.palantir.baseline.services.JarClassHasher;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.time.Duration;
import java.time.Instant;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import java.util.jar.JarEntry;
import java.util.jar.JarInputStream;
import java.util.stream.Collectors;
import org.gradle.api.artifacts.Configuration;
import org.gradle.api.artifacts.ModuleVersionIdentifier;
import org.gradle.api.artifacts.ResolvedArtifact;
import org.slf4j.Logger;

public final class ClassUniquenessAnalyzer {

private final Map<Set<ModuleVersionIdentifier>, Set<String>> jarsToClasses = new HashMap<>();
private final Map<String, Set<HashCode>> classToHashCodes = new HashMap<>();
private final JarClassHasher jarHasher;
private final SetMultimap<Set<ModuleVersionIdentifier>, String> jarsToClasses = HashMultimap.create();
private final SetMultimap<String, HashCode> classToHashCodes = HashMultimap.create();
private final Logger log;

public ClassUniquenessAnalyzer(Logger log) {
public ClassUniquenessAnalyzer(JarClassHasher jarHasher, Logger log) {
this.jarHasher = jarHasher;
this.log = log;
}

Expand All @@ -57,57 +54,36 @@ public void analyzeConfiguration(Configuration configuration) {

// we use these temporary maps to accumulate information as we process each jar,
// so they may include singletons which we filter out later
Map<String, Set<ModuleVersionIdentifier>> classToJars = new HashMap<>();
Map<String, Set<HashCode>> tempClassToHashCodes = new HashMap<>();
SetMultimap<String, ModuleVersionIdentifier> classToJars = HashMultimap.create();
SetMultimap<String, HashCode> tempClassToHashCodes = HashMultimap.create();

dependencies.stream().forEach(resolvedArtifact -> {
for (ResolvedArtifact resolvedArtifact : dependencies) {
File file = resolvedArtifact.getFile();
if (!file.exists()) {
log.info("Skipping non-existent jar {}: {}", resolvedArtifact, file);
return;
}

try (FileInputStream fileInputStream = new FileInputStream(file);
JarInputStream jarInputStream = new JarInputStream(fileInputStream)) {
JarEntry entry;
while ((entry = jarInputStream.getNextJarEntry()) != null) {
if (entry.isDirectory() || !entry.getName().endsWith(".class")) {
continue;
}

if (entry.getName().contains("module-info.class")) {
// Java 9 allows jars to have a module-info.class file in the root,
// we shouldn't complain about these.
continue;
}

String className = entry.getName().replaceAll("/", ".").replaceAll(".class", "");
HashingInputStream inputStream = new HashingInputStream(Hashing.sha256(), jarInputStream);
ByteStreams.exhaust(inputStream);

multiMapPut(
classToJars,
className,
resolvedArtifact.getModuleVersion().getId());

multiMapPut(tempClassToHashCodes, className, inputStream.hash());
}
} catch (IOException e) {
log.error("Failed to read JarFile {}", resolvedArtifact, e);
throw new RuntimeException(e);
ImmutableSetMultimap<String, HashCode> hashes =
jarHasher.hashClasses(resolvedArtifact, log).getHashesByClassName();

for (Map.Entry<String, HashCode> entry : hashes.entries()) {
String className = entry.getKey();
HashCode hashValue = entry.getValue();
classToJars.put(className, resolvedArtifact.getModuleVersion().getId());
tempClassToHashCodes.put(className, hashValue);
}
});
}

// discard all the classes that only come from one jar - these are completely safe!
classToJars.entrySet().stream()
classToJars.asMap().entrySet().stream()
.filter(entry -> entry.getValue().size() > 1)
.forEach(entry -> multiMapPut(jarsToClasses, entry.getValue(), entry.getKey()));
.forEach(entry -> jarsToClasses.put(ImmutableSet.copyOf(entry.getValue()), entry.getKey()));

// figure out which classes have differing hashes
tempClassToHashCodes.entrySet().stream()
tempClassToHashCodes.asMap().entrySet().stream()
.filter(entry -> entry.getValue().size() > 1)
.forEach(entry ->
entry.getValue().forEach(value -> multiMapPut(classToHashCodes, entry.getKey(), value)));
.forEach(entry -> entry.getValue().forEach(value -> classToHashCodes.put(entry.getKey(), value)));

Instant after = Instant.now();
log.info(
Expand All @@ -126,7 +102,7 @@ private Collection<Set<ModuleVersionIdentifier>> getProblemJars() {
}

/** Class names that appear in all of the given jars. */
public Set<String> getSharedClassesInProblemJars(Collection<ModuleVersionIdentifier> problemJars) {
public Set<String> getSharedClassesInProblemJars(Set<ModuleVersionIdentifier> problemJars) {
return jarsToClasses.get(problemJars);
}

Expand All @@ -138,17 +114,9 @@ public Collection<Set<ModuleVersionIdentifier>> getDifferingProblemJars() {
}

/** Class names which appear in all of the given jars and also have non-identical implementations. */
public Set<String> getDifferingSharedClassesInProblemJars(Collection<ModuleVersionIdentifier> problemJars) {
public Set<String> getDifferingSharedClassesInProblemJars(Set<ModuleVersionIdentifier> problemJars) {
return getSharedClassesInProblemJars(problemJars).stream()
.filter(classToHashCodes::containsKey)
.collect(toSet());
}

private static <K, V> void multiMapPut(Map<K, Set<V>> map, K key, V value) {
map.compute(key, (unused, collection) -> {
Set<V> newCollection = collection != null ? collection : new HashSet<>();
newCollection.add(value);
return newCollection;
});
}
}