-
-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-39547] - Corrupt jar cache #130
Conversation
We need a way to calculate reliable checksums and as implemented it causes temporary write-then-move files to have checksums cached never to use used.
This moves checksum caching to FileSystemJarCache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO concurrency management is broken in this PR. I would recommend to just introduce a sync object and to synchronize all methods properly against it. ConcurrentMap
is not reliable for get-get operation-
@@ -128,8 +94,4 @@ public void write(byte[] b) { | |||
public void write(byte[] b, int off, int len) { | |||
} | |||
} | |||
|
|||
@edu.umd.cs.findbugs.annotations.SuppressWarnings("DMI_COLLECTION_OF_URLS") | |||
private static final ConcurrentMap<URL,Checksum> CHECKSUMS_BY_URL = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this change completely deletes the checksum cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I do not think it was a good idea to implement it in Checksum
class in the first place.
// forURL will all fall through to this method since the first caller's | ||
// calculation may take a while. Hence re-check the cache at the start. | ||
String location = file.getCanonicalPath(); | ||
if (checksumsByPath.containsKey(location)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 checksumsByPath
is not correctly synchronized here. contains()
and get()
go as separate calls, and the item may be removed between these two calls
synchronized (this) { | ||
cs = Checksum.forFile(file); | ||
} | ||
checksumsByPath.put(location, cs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, in the case of unlikely parallel download of different JARs, we may put wrong checksum here.
- Download A
- Thread 1. Thread interrupt/scheduling
- Thread 1. Calculate checksum
- Thread 2. Download B, replace A
- Thread 2. Calculate B checksum
- Thread 2. Put B checksum to the cache
- Thread 2. Thread interrupt/scheduling
- Thread 1. Put A checksum to the cache (but the file is B)
@@ -18,10 +18,9 @@ | |||
*/ | |||
@edu.umd.cs.findbugs.annotations.SuppressWarnings("SE_BAD_FIELD") | |||
class JarLoaderImpl implements JarLoader, Serializable { | |||
private final ConcurrentMap<Checksum,URL> knownJars = new ConcurrentHashMap<Checksum,URL>(); | |||
private final ConcurrentMap<Checksum,URL> knownJars = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modification of the serializable class. Would it cause compatibility issues if we send it over the channel with different remoting versions on sides. I'd guess so. 🐜
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just ignore. 🤦
@@ -110,24 +107,36 @@ public Void answer(InvocationOnMock invocationOnMock) throws Throwable { | |||
} | |||
|
|||
@Test | |||
public void retrieveInvalidChecksum() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference issue?
|
||
@edu.umd.cs.findbugs.annotations.SuppressWarnings("DMI_COLLECTION_OF_URLS") // TODO: fix this | ||
private final ConcurrentMap<URL,Checksum> checksums = new ConcurrentHashMap<URL,Checksum>(); | ||
private final ConcurrentMap<String, Checksum> checksums = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oleg-nenashev, I did not considered this to be remotable as this is master-side abstraction provided the classloading is not by-directional. This does not seem to be generally true, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JarLoader is being used as a Channel property, which is accessible on both sides. So it may be really serialized over the channel if the Channel#remoteChannel
implementation passes it back as a seruializable result
|
||
@edu.umd.cs.findbugs.annotations.SuppressWarnings("DMI_COLLECTION_OF_URLS") // TODO: fix this | ||
private final ConcurrentMap<URL,Checksum> checksums = new ConcurrentHashMap<URL,Checksum>(); | ||
private final ConcurrentMap<String, Checksum> checksums = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JarLoader is being used as a Channel property, which is accessible on both sides. So it may be really serialized over the channel if the Channel#remoteChannel
implementation passes it back as a seruializable result
return target.toURI().toURL(); | ||
} | ||
|
||
LOGGER.warning(String.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really a warning? I suppose it's a valid behavior when somebody updates jars on the remote side (e.g. plugin update)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, when you update a jar, it should have different checksum and therefore be stored on different location so this code path will not be executed. This warning is a signal that what we found cached as ABC
(by location) is in fact DEF
(by checksum). This can be caused by manipulating the cache dir (what this PR addresses), bug in remoting that cached jar on wrong location or change of checksum algorithm. In all these cases it is undesirable to load the jar blindly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
// the checksum. | ||
LOGGER.fine(String.format("Jar file already exists: %16X%16X", sum1, sum2)); | ||
return target.toURI().toURL(); | ||
Checksum actual = fileChecksum(target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improvement follow-up proposal: Perform checksum calculation for JAR cache during the remoting startup (with option to disable it) before establishing the connection. Should help with performance issues during first job execution due to classloading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging level is my last concern. And test failure, but it's unlikely related
@olivergondza Maybe it makes sense to integrate the patch into stable-2.x once it's ready |
I do not consider it all that critical after we agreed it is not security vulnerability. No hurry as long as I am concerned. |
Retriggering the CI |
Maven invocation on ci.jenkins.io fails badly as maven terminate prematurely there. I am not able to reproduce that locally with:
|
JENKINS-39547
Checksum
class toJarLoaderImpl
andFileSystemJarCache
.#retrieve
on same jar more than once between agent restarts. Getting rid of it would improve this fix as it would detect jar cache manipulation without restart.