-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Makes sure there are no duplicate cache entries. #739
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be a legitimate reason to provide the same image multiple times (e.g., A, B, A to re-apply a change made in A that was overwritten in B).
@briandealwis, as far as I can tell in how we layer in jib, |
Each layer (ie. A, B) would have a set of changes for files (ie. x, y, z). Each of these changes can be either a create, update, or delete, which translates to a change operation to set the contents of a file. So each layer can be represented by a set of pairs |
@@ -19,34 +19,36 @@ | |||
import com.google.common.collect.ImmutableList; | |||
import com.google.common.collect.ImmutableSet; | |||
import java.util.Iterator; | |||
import java.util.LinkedHashSet; | |||
import java.util.Set; | |||
import javax.annotation.Nullable; | |||
|
|||
/** Holds the layers for an image. Makes sure that each layer is only added once. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove "Makes sure that each layer is only added once" or update the Javadoc.
layerDigestsBuilder.add(layer.getBlobDescriptor().getDigest()); | ||
layersBuilder.remove(layer); | ||
layersBuilder.add(layer); | ||
lastLayer = layer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastLayer
seems unused.
import javax.annotation.Nullable; | ||
|
||
/** Holds the layers for an image. Makes sure that each layer is only added once. */ | ||
public class ImageLayers<T extends Layer> implements Iterable<T> { | ||
|
||
public static class Builder<T extends Layer> { | ||
|
||
private final ImmutableList.Builder<T> layersBuilder = ImmutableList.builder(); | ||
private final Set<T> layersBuilder = new LinkedHashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Isn't the order of layers important? (
Set
does not provide any ordering guarantees.) - Are you comfortable calling this a builder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using LinkedHashSet
to ensure the order is the order of insertion - I'll change the signature to LinkedHashSet<T> layers
to make this clearer.
} | ||
layerDigestsBuilder.add(layer.getBlobDescriptor().getDigest()); | ||
layersBuilder.remove(layer); | ||
layersBuilder.add(layer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Set
allows duplicates, so probably there is no need to remove before adding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reinsertion in LinkedHashSet
is defined to be: "Note that insertion order is not affected if an element is re-inserted into the set", so I believe the remove
is necessary to move the layer to the end of the linked list.
public class ImageLayers<T extends Layer> implements Iterable<T> { | ||
|
||
public static class Builder<T extends Layer> { | ||
|
||
private final ImmutableList.Builder<T> layersBuilder = ImmutableList.builder(); | ||
private final LinkedHashSet<T> layers = new LinkedHashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think explicitly marking this a LinkedHashSet
is a good idea, but it warrants a bit of explanation that the set aspect is for the fast comparison. TBH given the need to remove-and-then-add, it would be better to just make this a List. Docker is limited to 40 layers (I believe), so the O(n) vs O(ln n) doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the speed here might actually matter a bit, since this class is used to represent the layers in the cache as well (as this is mostly fixing #721). Having the remove-then-add would be O(n)
performed O(n)
times (for n
layers). Since the cache could potentially have many layers after extensive use (not limited to the Docker image layer limit), the O(n2)
vs O(n ln n)
(or amortized O(n)
) could potentially be a noticeable difference when manipulating the cache metadata.
lastLayer = layer; | ||
} | ||
layerDigestsBuilder.add(layer.getBlobDescriptor().getDigest()); | ||
layers.remove(layer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't LinkedHashSet
s already handle duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this is to re-order the element (#739 (comment)).
That said, it seems worth to put a comment like "remove necessary to rearrange the insertion-order of LinkedHashSet
".
I wonder if this is possible: for example, both A and B create a read-only file (no permission to write or modify it) at the same location, that once placed no one can overwrite. That is, whoever writes the file first wins. If that's possible, |
@chanseokoh The file permissions for the files will not affect the overwrite. In the layered container filesystem at runtime, the order of the search for a file goes from the last layer backwards, so prior layers with a file will not be reached if later layers have the same file. |
I wasn't clear in what I wrote previously: by changing |
@briandealwis Yep, the intention for this fix is to also have the same optimization applied to generated images as well. |
Okay, this PR looks good to me. |
Looks like this helped to fix the failing integration test for the simple project, but the empty project still fails (because the second build time is longer). |
Fixes #721
The solution here is to, when adding a new layer, remove any prior layers with the same digest (essentially bumping layers to the top if re-added). This should work, for example, since applying layers A and B in order ABA should produce an identical filesystem as just BA.