Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-39547] - Corrupt jar cache #130

Merged
merged 4 commits into from
Dec 16, 2016

Conversation

olivergondza
Copy link
Member

JENKINS-39547

  • Verify checksum after loaded from slave jar cache on agent side.
  • Extract checksum caching from Checksum class to JarLoaderImpl and FileSystemJarCache.
    • It was done twice on master side and made the system more fragile on both ends.
    • I wonder if we need checksum cashing on agent side at all as we should rarely call #retrieve on same jar more than once between agent restarts. Getting rid of it would improve this fix as it would detect jar cache manipulation without restart.

We need a way to calculate reliable checksums and as implemented it causes
temporary write-then-move files to have checksums cached never to use used.
This moves checksum caching to FileSystemJarCache.
@oleg-nenashev oleg-nenashev changed the title Corrupt jar cache [JENKINS-39547] - Corrupt jar cache Nov 7, 2016
Copy link
Member

@oleg-nenashev oleg-nenashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO concurrency management is broken in this PR. I would recommend to just introduce a sync object and to synchronize all methods properly against it. ConcurrentMap is not reliable for get-get operation-

@@ -128,8 +94,4 @@ public void write(byte[] b) {
public void write(byte[] b, int off, int len) {
}
}

@edu.umd.cs.findbugs.annotations.SuppressWarnings("DMI_COLLECTION_OF_URLS")
private static final ConcurrentMap<URL,Checksum> CHECKSUMS_BY_URL =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this change completely deletes the checksum cache

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do not think it was a good idea to implement it in Checksum class in the first place.

// forURL will all fall through to this method since the first caller's
// calculation may take a while. Hence re-check the cache at the start.
String location = file.getCanonicalPath();
if (checksumsByPath.containsKey(location)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 checksumsByPath is not correctly synchronized here. contains() and get() go as separate calls, and the item may be removed between these two calls

synchronized (this) {
cs = Checksum.forFile(file);
}
checksumsByPath.put(location, cs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, in the case of unlikely parallel download of different JARs, we may put wrong checksum here.

  • Download A
  • Thread 1. Thread interrupt/scheduling
  • Thread 1. Calculate checksum
  • Thread 2. Download B, replace A
  • Thread 2. Calculate B checksum
  • Thread 2. Put B checksum to the cache
  • Thread 2. Thread interrupt/scheduling
  • Thread 1. Put A checksum to the cache (but the file is B)

@@ -18,10 +18,9 @@
*/
@edu.umd.cs.findbugs.annotations.SuppressWarnings("SE_BAD_FIELD")
class JarLoaderImpl implements JarLoader, Serializable {
private final ConcurrentMap<Checksum,URL> knownJars = new ConcurrentHashMap<Checksum,URL>();
private final ConcurrentMap<Checksum,URL> knownJars = new ConcurrentHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modification of the serializable class. Would it cause compatibility issues if we send it over the channel with different remoting versions on sides. I'd guess so. 🐜

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ignore. 🤦

@@ -110,24 +107,36 @@ public Void answer(InvocationOnMock invocationOnMock) throws Throwable {
}

@Test
public void retrieveInvalidChecksum() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference issue?


@edu.umd.cs.findbugs.annotations.SuppressWarnings("DMI_COLLECTION_OF_URLS") // TODO: fix this
private final ConcurrentMap<URL,Checksum> checksums = new ConcurrentHashMap<URL,Checksum>();
private final ConcurrentMap<String, Checksum> checksums = new ConcurrentHashMap<>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oleg-nenashev, I did not considered this to be remotable as this is master-side abstraction provided the classloading is not by-directional. This does not seem to be generally true, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JarLoader is being used as a Channel property, which is accessible on both sides. So it may be really serialized over the channel if the Channel#remoteChannel implementation passes it back as a seruializable result


@edu.umd.cs.findbugs.annotations.SuppressWarnings("DMI_COLLECTION_OF_URLS") // TODO: fix this
private final ConcurrentMap<URL,Checksum> checksums = new ConcurrentHashMap<URL,Checksum>();
private final ConcurrentMap<String, Checksum> checksums = new ConcurrentHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JarLoader is being used as a Channel property, which is accessible on both sides. So it may be really serialized over the channel if the Channel#remoteChannel implementation passes it back as a seruializable result

return target.toURI().toURL();
}

LOGGER.warning(String.format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really a warning? I suppose it's a valid behavior when somebody updates jars on the remote side (e.g. plugin update)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, when you update a jar, it should have different checksum and therefore be stored on different location so this code path will not be executed. This warning is a signal that what we found cached as ABC (by location) is in fact DEF (by checksum). This can be caused by manipulating the cache dir (what this PR addresses), bug in remoting that cached jar on wrong location or change of checksum algorithm. In all these cases it is undesirable to load the jar blindly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

// the checksum.
LOGGER.fine(String.format("Jar file already exists: %16X%16X", sum1, sum2));
return target.toURI().toURL();
Checksum actual = fileChecksum(target);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvement follow-up proposal: Perform checksum calculation for JAR cache during the remoting startup (with option to disable it) before establishing the connection. Should help with performance issues during first job execution due to classloading.

Copy link
Member

@oleg-nenashev oleg-nenashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging level is my last concern. And test failure, but it's unlikely related

@oleg-nenashev
Copy link
Member

@olivergondza Maybe it makes sense to integrate the patch into stable-2.x once it's ready

@olivergondza
Copy link
Member Author

I do not consider it all that critical after we agreed it is not security vulnerability. No hurry as long as I am concerned.

@oleg-nenashev
Copy link
Member

Retriggering the CI

@olivergondza
Copy link
Member Author

Maven invocation on ci.jenkins.io fails badly as maven terminate prematurely there. I am not able to reproduce that locally with:

Apache Maven 3.3.9 (NON-CANONICAL_2015-11-23T13:17:27+03:00_root; 2015-11-23T11:17:27+01:00)
Maven home: /opt/maven
Java version: 1.8.0_112, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-8-jdk/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.4.31-1-lts", arch: "amd64", family: "unix"

@oleg-nenashev oleg-nenashev merged commit cdd5bce into jenkinsci:master Dec 16, 2016
oleg-nenashev added a commit that referenced this pull request Dec 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants