Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incrementally sending build context #231

Merged
merged 4 commits into from
Jun 27, 2017

Conversation

tonistiigi
Copy link
Member

Adds support for incremental context updates to docker build moby/moby#32677 . The feature is behind a --stream=true flag atm, mostly for a testing period and in case there is some reason to use the old method. When we feel confident enough we can just switch the default value and users shouldn't see any changes except things being faster.

I also noticed that it vendors in chrootarchive and reexec package. These are not really needed but require some refactoring. Either splitting the receiver and sender part to a separate package, or passing an archiver instance into the transfer function. I can do that in a follow-up with next update.

@dnephin @vieux

@tonistiigi tonistiigi changed the title Client session fssession Incrementally sending build context Jun 22, 2017
@codecov-io
Copy link

codecov-io commented Jun 22, 2017

Codecov Report

Merging #231 into master will decrease coverage by 0.61%.
The diff coverage is 25.44%.

@@            Coverage Diff             @@
##           master     #231      +/-   ##
==========================================
- Coverage   47.19%   46.58%   -0.62%     
==========================================
  Files         171      172       +1     
  Lines       11556    11631      +75     
==========================================
- Hits         5454     5418      -36     
- Misses       5791     5902     +111     
  Partials      311      311

dnephin and others added 2 commits June 23, 2017 11:34
Signed-off-by: Daniel Nephin <dnephin@docker.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests would be great.

We don't have any in master yet, but I wrote on in #233 which uses a fakeClient. You could grab my changes to the fake from that PR.

Splitting out some functions might make it possible to test some in isolation as well (like the build shared key functionality).

@@ -200,7 +200,7 @@ func (cli *DockerCli) Initialize(opts *cliflags.ClientOptions) error {

// if server version is lower than the current cli, downgrade
if versions.LessThan(ping.APIVersion, cli.client.ClientVersion()) {
cli.client.UpdateClientVersion(ping.APIVersion)
cli.client.NegotiateAPIVersionPing(ping)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NegotiateAPIVersionPing() does the minimum version check as well, so the previous if ping.APIVersion == "" can be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it also does this LessThan check, so this if can be removed as well.

@@ -133,6 +146,10 @@ func NewBuildCommand(dockerCli *command.DockerCli) *cobra.Command {
flags.SetAnnotation("squash", "experimental", nil)
flags.SetAnnotation("squash", "version", []string{"1.25"})

flags.BoolVar(&options.stream, "stream", false, "Stream attaches to server to negotiate build context")
flags.SetAnnotation("stream", "experimental", nil)
flags.SetAnnotation("stream", "version", []string{"1.31"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we normally don't set the version until it's out of experimental. I'm not sure how it works if we set both. Maybe it just works?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

squash option above seems to set it

// Dockerfile which uses trusted pulls.
buildCtx = replaceDockerfileTarWrapper(ctx, buildCtx, relDockerfile, translator, &resolvedTags)
} else if dockerfileCtx != nil {
newDockerfile, _, err := rewriteDockerfileFrom(context.Background(), dockerfileCtx, translator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use ctx for the context?

@@ -250,16 +272,34 @@ func runBuild(dockerCli *command.DockerCli, options buildOptions) error {
}
}

ctx := context.Background()
if options.stream && dockerfileCtx == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is dockerfileCtx == nil the same as !options.dockerfileFromStdin() ? Could we use the latter so it's more clear what this means?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment. In this case they are equal as nothing has modified dockerfileCtx yet but this will change in other comparisons. dockerfileCtx doesn't really mean that it has to be from stdin, it means that custom Dockerfile source has been set.

@@ -250,16 +272,34 @@ func runBuild(dockerCli *command.DockerCli, options buildOptions) error {
}
}

ctx := context.Background()
if options.stream && dockerfileCtx == nil {
f, err := os.Open(relDockerfile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: you can assign directly to dockerfileCtx here, no need to create a temporary variable.

buildCtx = dockerfileCtx
}
var s *session.Session
if isSessionSupported(dockerCli) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is already way too large. There's no way we can probably unit test most of it. We really should not be adding any significant code to it.

Could we make the rest of this (up to line 362, and the if s != nil block) into a few new functions? Even if the arg list is long, at this point I think it's still an improvement. We can work on refactoring args into structs in a follow up.

Maybe even a new file along with sizeProgress and bufferedWriter ? That way all the session handling stuff is together and not mixed up with other build things.

}

response, err := dockerCli.Client().ImageBuild(ctx, body, buildOptions)
if err != nil {
if options.quiet {
fmt.Fprintf(dockerCli.Err(), "%s", progBuff)
}
cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already handled by the defer cancel() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a defer that makes sure we have always written out the progress before returning from this function. This cancel is for unblocking that. I'll add a comment.

}()
buildBuff = buf

remote = "client-session"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant?


func getBuildSharedKey(dir string) (string, error) {
dt := []byte(cliconfig.Dir()) // if no access to config dir then only use path
if err := os.MkdirAll(cliconfig.Dir(), 0700); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't "no access" be an error? This seems like it could result in some pretty surprising errors, and is also likely to be rare, so not really a big inconvenience.

Maybe there should be a flag to set a filepath or directory for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it would still use the configdir+dir combination as sharedKey what should work well for 99% of the cases. And even if there is a collision it just means that performance will not be ideal. Maybe print a warning?

}
}

dt, err = ioutil.ReadFile(sessionFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function would be easier to read as a struct, something like

type SharedKey struct {
    ...
}

func (...) Read() ([]byte, error) {...}
func (...) create() (error) {...}

Then getBuildSharedKey() could just call Read() and would only be concerned with hashing the value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will never be multiple instances of sharedkey and create is only called if read fails. Maybe helps to split this to getSessionSharedKey(dir) and getNodeIdentifier() as some hashing is always needed. Getting node identifier can be skipped if it has already run once.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@tonistiigi tonistiigi force-pushed the client-session-fssession branch 2 times, most recently from bf5a8c0 to 9776fb5 Compare June 23, 2017 23:24
@tonistiigi
Copy link
Member Author

@dnephin Updated. Added more comments for the conditions and moved the helper functions to a separate file where possible.

@vieux
Copy link
Contributor

vieux commented Jun 26, 2017

LGTM

if err := ioutil.WriteFile(sessionFile, []byte(hex.EncodeToString(b)), 0600); err != nil {
return "", err
}
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit: not needed. Don't push a fix only for this small nit.

func getBuildSharedKey(dir string) (string, error) {
// build session is hash of build dir with node based randomness
sessionFile := filepath.Join(cliconfig.Dir(), ".buildsharedkey")
if _, err := os.Lstat(sessionFile); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: why Lstat and not simply Stat ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason. We are not creating symlinks to this file so I don't think we should handle them as well.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some questions/comments

@@ -373,6 +414,36 @@ func runBuild(dockerCli *command.DockerCli, options buildOptions) error {
return nil
}

func getBuildSharedKey(dir string) (string, error) {
// build session is hash of build dir with node based randomness
sessionFile := filepath.Join(cliconfig.Dir(), ".buildsharedkey")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in a tmp-dir? Also, what happens if multiple clients perform a build in parallel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it needs to persist. This is only written once and correctness of this value only affects performance.

@@ -23,6 +23,7 @@ func TestDiskUsageContextFormatWrite(t *testing.T) {
Images 0 0 0B 0B
Containers 0 0 0B 0B
Local Volumes 0 0 0B 0B
Build Cache 0B 0B
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we should have something (-) in the first two columns, but we can tweak that later

}

var body io.Reader = progress.NewProgressReader(buildCtx, progressOutput, 0, "", "Sending build context to Docker daemon")
// add context stream to the session
if options.stream && s != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function (runBuild()) is getting awfully long now; we should have a look at refactoring it a bit, and splitting parts to separate functions where possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already moved some of it to the session file (and actually think it became less readable). This is synchronized with the build runner so needs to be in the same function. We could, of course, split things up like opts setting of context preparing but this is already large enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely doesn't have to be part of this PR; I opened an issue for it; #242


const clientSessionRemote = "client-session"

func isSessionSupported(dockerCli *command.DockerCli) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit 0c780c8 adds isSessionSupported() and getBuildSharedKey(), which are moved in 9776fb5, perhaps those commits should be squashed, or the change moved from one to the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can squash but that merges session+filesync. The move was requested from review. I think splitting up would take too much time.

func tryNodeIdentifier() (out string) {
out = cliconfig.Dir() // return config dir as default on permission error
if err := os.MkdirAll(cliconfig.Dir(), 0700); err == nil {
sessionFile := filepath.Join(cliconfig.Dir(), ".buildsharedkey")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in a tmp-dir? What happens if multiple clients perform a build in parallel?


// add context stream to the session
if options.stream && s != nil {
syncDone := make(chan error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment on why this chan error is not used to retrieve errors but as a syncing mechanism.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

Add incremental context send support

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@tonistiigi
Copy link
Member Author

@thaJeztah Updated the filename and squashed.

@tiborvass Added comment.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tiborvass
Copy link
Collaborator

LGTM

buildCtx = replaceDockerfileTarWrapper(ctx, buildCtx, relDockerfile, translator, &resolvedTags)
} else if dockerfileCtx != nil {
// if there was not archive context still do the possible replacements in Dockerfile
newDockerfile, _, err := rewriteDockerfileFrom(ctx, dockerfileCtx, translator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonistiigi I notice that resolvedTags is not being assigned here. Isn't this necessary for content trust?

case <-ctx.Done():
}
}()
buildBuff = buf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If options.quiet is true then buildBuff is a bytes.Buffer. Later on it is expected that this remains a bytes.Buffer() so that its contents can be printed.

I think this breaks with --quiet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agh, sprintf of io.Writer

imageID = fmt.Sprintf("%s", buildBuff)
🤦‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this fixed in #294, it only sets buf if !quiet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants