-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up class uniqueness analyses by caching results of reading jars #1837
Changes from all commits
8261c85
00a24c8
b27b64c
8366a4a
d097af1
7e73be8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
type: improvement | ||
improvement: | ||
description: |- | ||
Increase the speed of the `checkClassUniqueness` task, especially in large repos, by adding caching of jar information. | ||
|
||
Fix class names listed in the `baseline-class-uniqueness.lock` when the class or package name contains the substring `class`. In rare cases, this may require running `./gradlew checkClassUniqueness --write-locks` to update the files. | ||
links: | ||
- https://github.com/palantir/gradle-baseline/pull/1837 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
/* | ||
* (c) Copyright 2021 Palantir Technologies Inc. All rights reserved. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package com.palantir.baseline.services; | ||
|
||
import com.github.benmanes.caffeine.cache.Cache; | ||
import com.github.benmanes.caffeine.cache.Caffeine; | ||
import com.google.common.collect.ImmutableListMultimap; | ||
import com.google.common.collect.ImmutableSetMultimap; | ||
import com.google.common.hash.HashCode; | ||
import com.google.common.hash.Hashing; | ||
import com.google.common.hash.HashingInputStream; | ||
import com.google.common.io.ByteStreams; | ||
import java.io.File; | ||
import java.io.FileInputStream; | ||
import java.io.IOException; | ||
import java.util.List; | ||
import java.util.Map; | ||
import java.util.jar.JarEntry; | ||
import java.util.jar.JarInputStream; | ||
import java.util.stream.Collectors; | ||
import org.gradle.api.artifacts.ModuleVersionIdentifier; | ||
import org.gradle.api.artifacts.ResolvedArtifact; | ||
import org.gradle.api.services.BuildService; | ||
import org.gradle.api.services.BuildServiceParameters; | ||
import org.slf4j.Logger; | ||
|
||
public abstract class JarClassHasher implements BuildService<BuildServiceParameters.None>, AutoCloseable { | ||
private final Cache<ModuleVersionIdentifier, Result> cache = | ||
Caffeine.newBuilder().build(); | ||
|
||
public static final class Result { | ||
private final ImmutableSetMultimap<String, HashCode> hashesByClassName; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I originally had this as just an ImmutableMap, but it failed on a case where it tried to enter the same key/value pair twice for some reason. I didn't collect the jar name, but I can share the class name internally. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you find out why that is? It means that we iterate over the same class name and class hash twice? If you can reproduce, it might be interesting to retrieve the jar and class names. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I sent you the details internally. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like the issue is because the published jar is malformed and contains duplicate files. A problem with the publisher and not with gradle-baseline so no need to block this PR based on that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a final thing, can we add a log when this happens? Ideally should hunt down these bad jars. |
||
|
||
private Result(ImmutableSetMultimap<String, HashCode> hashesByClassName) { | ||
this.hashesByClassName = hashesByClassName; | ||
} | ||
|
||
public ImmutableSetMultimap<String, HashCode> getHashesByClassName() { | ||
return hashesByClassName; | ||
} | ||
|
||
public static Result empty() { | ||
return new Result(ImmutableSetMultimap.of()); | ||
} | ||
} | ||
|
||
public final Result hashClasses(ResolvedArtifact resolvedArtifact, Logger logger) { | ||
return cache.get(resolvedArtifact.getModuleVersion().getId(), _moduleId -> { | ||
File file = resolvedArtifact.getFile(); | ||
if (!file.exists()) { | ||
return Result.empty(); | ||
} | ||
|
||
ImmutableListMultimap.Builder<String, HashCode> hashesByClassName = ImmutableListMultimap.builder(); | ||
try (FileInputStream fileInputStream = new FileInputStream(file); | ||
JarInputStream jarInputStream = new JarInputStream(fileInputStream)) { | ||
JarEntry entry; | ||
while ((entry = jarInputStream.getNextJarEntry()) != null) { | ||
if (entry.isDirectory() || !entry.getName().endsWith(".class")) { | ||
continue; | ||
} | ||
|
||
if (entry.getName().contains("module-info.class")) { | ||
// Java 9 allows jars to have a module-info.class file in the root, | ||
// we shouldn't complain about these. | ||
continue; | ||
} | ||
|
||
String className = entry.getName().replaceAll("/", ".").replaceAll("\\.class$", ""); | ||
HashingInputStream inputStream = new HashingInputStream(Hashing.sha256(), jarInputStream); | ||
ByteStreams.exhaust(inputStream); | ||
|
||
hashesByClassName.put(className, inputStream.hash()); | ||
} | ||
} catch (IOException e) { | ||
throw new RuntimeException(e); | ||
} | ||
|
||
ImmutableListMultimap<String, HashCode> builtHashesByClassName = hashesByClassName.build(); | ||
List<String> keysWithDuplicateEntries = builtHashesByClassName.asMap().entrySet().stream() | ||
.filter(entry -> entry.getValue().size() > 1) | ||
.map(Map.Entry::getKey) | ||
.sorted() | ||
.collect(Collectors.toList()); | ||
if (!keysWithDuplicateEntries.isEmpty()) { | ||
logger.warn( | ||
"Warning: Gradle Baseline found a dependency jar that contains more than one zip entry for " | ||
+ "a class and is likely malformed: {}\n" | ||
+ "The following entries appear multiple times: {}\n" | ||
+ "This issue should be reported to the maintainer of the dependency.", | ||
resolvedArtifact.getModuleVersion(), | ||
keysWithDuplicateEntries); | ||
} | ||
|
||
return new Result(ImmutableSetMultimap.copyOf(builtHashesByClassName)); | ||
}); | ||
} | ||
|
||
@Override | ||
public final void close() { | ||
// Try to free up memory when this is no longer needed | ||
cache.invalidateAll(); | ||
cache.cleanUp(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we use a BuildService here is to reuse the cache between tasks of different projects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. The other alternative I'm aware of is putting an extension on the root project and locating the cache there, which should also work.