Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement build cache based on history array #26839

Merged
merged 2 commits into from
Sep 26, 2016

Conversation

tonistiigi
Copy link
Member

@tonistiigi tonistiigi commented Sep 22, 2016

carry #24711
fixes #26065

Adds capability to specify images used as a cache source on build. These images do not need to have local parent chain and can be pulled from other registries. User needs to make sure to only use trusted images as sources.

Usage:

docker pull myimage:v1.0
docker build --cache-from myimage:v1.0 -t myimage:v1.1 .

@graingert
Copy link
Contributor

Is this not going to be plugin based like #24711 (comment) ?

@tonistiigi
Copy link
Member Author

@graingert If you look at the contents of #24711 it is not about the plugin method. This was changed if favor of #26065 proposal a long time.

@tonistiigi tonistiigi force-pushed the build-cache branch 2 times, most recently from ba1feda to d3222ea Compare September 22, 2016 23:48
@tonistiigi tonistiigi changed the title wip: Implement build cache based on history array Implement build cache based on history array Sep 22, 2016
@tonistiigi
Copy link
Member Author

@icecrime
Copy link
Contributor

I think design was already approved, so I'm tentatively moving to code review.

Thanks for the carry @tonistiigi!

var cacheFrom = []string{}
cacheFromJSON := r.FormValue("cachefrom")
if cacheFromJSON != "" {
if err := json.NewDecoder(strings.NewReader(cacheFromJSON)).Decode(&cacheFrom); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json.Unmarshal(cacheFromJSON, &cacheFrom)

@stevvooe
Copy link
Contributor

LGTM after the small nit.

return false
}
if parent == nil || len(parent.History) == 0 && len(parent.RootFS.DiffIDs) == 0 {
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil is considered a valid parent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil means FROM scratch so we ignore the parent and only check that the configuration matches

OSVersion: target.OSVersion,
})
if err != nil {
return "", errors.Wrapf(err, "failed to marshal image config")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors.Wrap


imgID, err := ic.daemon.imageStore.Create(config)
if err != nil {
return "", errors.Wrapf(err, "failed to create cache image")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors.Wrap

@@ -447,11 +447,11 @@ func (b *Builder) processImageFrom(img builder.Image) error {
// If no image is found, it returns `(false, nil)`.
// If there is any error, it returns `(false, err)`.
func (b *Builder) probeCache() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should be updated. Right now it says "probeCache checks if b.docker implements builder.ImageCache and image-caching is enabled"

@aaronlehmann
Copy link
Contributor

LGTM after nits

Based on work by KJ Tsanaktsidis

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: KJ Tsanaktsidis <kjtsanaktsidis@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@tonistiigi
Copy link
Member Author

@stevvooe @aaronlehmann updated

@stevvooe
Copy link
Contributor

LGTM

Copy link
Contributor

@aaronlehmann aaronlehmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

shin- added a commit to docker/docker-py that referenced this pull request Jan 26, 2017
This adds the cache-from build option (moby/moby#26839) and fixes #1382.
@kachkaev
Copy link

kachkaev commented Feb 3, 2017

Hi guys,

Is docker pull myimage:v1.0 strictly necessary? Won't just docker build --cache-from myimage:v1.0 -t myimage:v1.1 . do pull as well? https://docs.docker.com/engine/reference/builder/ or CLI help do not explain this unfortunately.

I also struggled to find an explanation of what happens if there are multiple --cache-from. I've got a GitLab CI script that builds an image and I'd like it to reuse previous builds to save time. The idea is to grab a previously built image with the current branch tag, then if this fails pull master and if both pulls fail, just build from scratch.

Should the following work smoothly for all possible states of the "cache"? Or is anything here missing or redundant here?

docker pull $CI_REGISTRY_IMAGE:$CI_BUILD_REF_NAME || docker pull $CI_REGISTRY_IMAGE:master || true
docker build --tag=$CI_REGISTRY_IMAGE:$CI_BUILD_REF_NAME --cache-from=$CI_REGISTRY_IMAGE:$CI_BUILD_REF_NAME --cache-from=$CI_REGISTRY_IMAGE:master .
docker push $CI_REGISTRY_IMAGE:$CI_BUILD_REF_NAME

For my-branch this resolves to:

docker pull registry.example.com/group/repo:my-branch || docker pull registry.example.com/group/repo:master || true
docker build --tag=registry.example.com/group/repo:my-branch --cache-from=registry.example.com/group/repo:my-branch --cache-from=registry.example.com/group/repo:master .
docker push registry.example.com/group/repo:my-branch

And for master the script looks like this:

docker pull registry.example.com/group/repo:master || docker pull registry.example.com/group/repo:master || true
docker build --tag=registry.example.com/group/repo:master --cache-from=registry.example.com/group/repo:my-branch --cache-from=registry.example.com/group/repo:master .
docker push registry.example.com/group/repo:master

@gajus
Copy link

gajus commented Feb 3, 2017

As far as my testing goes, docker build --cache-from is sufficient.

However, I have written a blog post about using --cache-from in my GitLab EE CI setup (https://medium.com/@gajus/making-docker-in-docker-builds-x2-faster-using-docker-cache-from-option-c01febd8ef84#.z3rl5bahm) and someone commented that docker pull is missing.

I am not sure whats the relevance of docker pull when using --cache-from should automatically attempt to pull the image. Is that not the case?

@kachkaev
Copy link

kachkaev commented Feb 3, 2017

Yeah, that's what is unclear to me too. If --cache-from does pull an image, what will happen if it does not exist or if there is a problem with logging in? If I use multiple --cache-from and the first image exists, will docker build attempt to pull the second and the third image and reuse as much layers as possible? If at least one of the images fails to pull, with the whole build command fail? Or will it warn? Or will it just silently build from scratch?

I know that these questions can be answered after a few tests, but it'd be great if they were documented so that people could write robust CI scripts without too much iteration.

@graingert
Copy link
Contributor

I'd like to know so I can add --cache-from to shipwright.

@tonistiigi
Copy link
Member Author

--cache-from does not pull if an image is not found. There have been discussions about it but the way we would like to enable it is to not pull full images but only pull data that is needed to determine the cache hit and from there decide to actually pull layers.

When using multiple --cache-from they are checked for a cache hit in the order that user specified. If one of the images produces a cache hit for a command only that image is used for the rest of the build.

databus23 added a commit to databus23/docker-image-resource that referenced this pull request Feb 13, 2017
With docker 1.13 a new build flag was added: --cache-from
This flag deprecated the buildcache tool that we used with docker 1.12

Details: moby/moby#26839
databus23 added a commit to databus23/docker-image-resource that referenced this pull request Mar 13, 2017
With docker 1.13 a new build flag was added: --cache-from
This flag deprecated the buildcache tool that we used with docker 1.12

Details: moby/moby#26839
dnephin pushed a commit to dnephin/docker that referenced this pull request Apr 17, 2017
Implement build cache based on history array
dnephin pushed a commit to dnephin/docker that referenced this pull request Apr 17, 2017
Implement build cache based on history array
ggeorgiev pushed a commit to ggeorgiev/docker-image-resource that referenced this pull request Jun 8, 2017
With docker 1.13 a new build flag was added: --cache-from
This flag deprecated the buildcache tool that we used with docker 1.12

Details: moby/moby#26839
@jkp
Copy link

jkp commented Jan 28, 2019

--cache-from does not pull if an image is not found. There have been discussions about it but the way we would like to enable it is to not pull full images but only pull data that is needed to determine the cache hit and from there decide to actually pull layers.

When using multiple --cache-from they are checked for a cache hit in the order that user specified. If one of the images produces a cache hit for a command only that image is used for the rest of the build.

I just ran into this issue and only by digging deep through history to find this PR did I find the answer to explain the behaviour of the flag - it is not documented this way. Suggestion: could the documentation be updated to include this detail?

@garrett-hopper
Copy link

--cache-from does not pull if an image is not found. There have been discussions about it but the way we would like to enable it is to not pull full images but only pull data that is needed to determine the cache hit and from there decide to actually pull layers.

When using multiple --cache-from they are checked for a cache hit in the order that user specified. If one of the images produces a cache hit for a command only that image is used for the rest of the build.

Was this behavior ever implemented? It looks like --cache-from still doesn't work at all unless the full image is pulled first. (I'd like to avoid having to pull the entire image and only have layers which are useful be pulled.)

@tonistiigi
Copy link
Member Author

@Jumblemuddle Yes, if you use buildkit #34715 (comment)

learnitall added a commit to learnitall/buildah that referenced this pull request Sep 12, 2021
Documented usage of cache-from allows for users
to specify multiple images to search through by
passing the cache-from argument more than
once. Current implementation of cache-from as
a string does not enable this behavior.

See: moby/moby#26839 (comment)

Signed-off-by: Ryan Drew <ryan@thedrews.org>
joverlee521 added a commit to nextstrain/docker-base that referenced this pull request May 16, 2023
It is unclear how docker handles multiple caches under the hood.
I did find an older comment¹ that if multiple `--cache-from` sources are
provided, it will use the first cache hit for the whole run.

This is my attempt to prioritize the ordering of the sources to use the
tagged images from the ghcr.io registry over the latest images from
docker.io.

¹ moby/moby#26839 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: control cache sources during build with --cache-from