Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distribution: separate layer and image config for v1 pushes #28

Merged
merged 1 commit into from
Oct 1, 2017

Conversation

petrosagg
Copy link
Contributor

Rebase of balena-io-archive/docker#12

When content addressability was introduced in #17924, a compatibility layer for registry v1 pushes was added. When the engine is asked to push an image to a v1 registry it needs to create v1 IDs for the images.

The strategy so far has been to use the full imageID for the first v1 layer and the ChainID for all other layers, effectively creating as many v1 layers as there are in the image. Only the top most layer contained
the image configuration and the other layers had a dummy json containing only a parent reference.

This becomes problematic when the first layer of the image is big. Consinder the following two Dockerfiles:

FROM busybox
RUN create_very_big_file
CMD /foo
FROM busybox
RUN create_very_big_file
CMD /bar

Both of these images will have the exact same layers, with the layer created by RUN create_very_big_file being the topmost one, but their imageIDs will differ since they have a different CMD and therefore different image configs.

When pushing to a v1 registry, the RUN create_very_big_file layer will be pushed twice, once with the v1 ID set to foo's imageID and once with the v1 ID set to bar's imageID. Also, any clients wanting to pull those
images won't realise it's the same layer and will proceed to download it twice.

This commit solves this problem by separating the layers from the image configuration information when pushing to a v1 registry. To do this, all layers of an image are pushed with their ChainIDs and a synthetic top level layer is created with its contents set to the EmptyLayer, it's config set to the image config, and its v1 ID set to the imageID. This will have the side-effect of adding one layer.

To prevent new layers being piled on top of each other forever, the code checks if the topmost layer is already an empty layer and in that case it uses that for the image configuration.

When content addressablity was introduced in #17924, a compatibility
layer for registry v1 pushes was added. When the engine is asked to
push an image to a v1 registry it needs to create v1 IDs for the images.

The strategy so far has been to use the full imageID for the first v1
layer and the ChainID for all other layers, effectively creating as many
v1 layers as there are in the image. Only the top most layer contained
the image configuration and the other layers had a dummy json containing
only a parent reference.

This becomes problematic when the first layer of the image is big.
Consinder the following two Dockerfiles:

FROM busybox
RUN create_very_big_file
CMD /foo

FROM busybox
RUN create_very_big_file
CMD /bar

Both of these images will have the exact same layers, with the layer
created by `RUN create_very_big_file` being the topmost one, but their
imageIDs will differ since they have a different CMD and therefore
different image configs.

When pushing to a v1 registry, the `RUN create_very_big_file` layer will
be pushed twice, once with the v1 ID set to foo's imageID and once with
the v1 ID set to bar's imageID. Also, any clients wanting to pull those
images won't realise it's the same layer and will proceed to download it
twice.

This commit solves this problem by separating the layers from the image
configuration information when pushing to a v1 registry. To do this, all
layers of an image are pushed with their ChainIDs and a synthetic top
level layer is created with its contents set to the EmptyLayer, it's
config set to the image config, and its v1 ID set to the imageID. This
will have the side-effect of adding one layer.

To prevent new layers being piled on top of each other forever, the code
checks if the topmost layer is already an empty layer and in that case
it uses that for the image configuration.

Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
@petrosagg petrosagg requested a review from zozo123 October 1, 2017 16:50
Copy link
Contributor

@zozo123 zozo123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@petrosagg petrosagg merged commit 970c454 into 17.06-resin Oct 1, 2017
@petrosagg petrosagg deleted the fix-registry-v1-push branch October 1, 2017 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants