-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Dockerfile #674
Optimize Dockerfile #674
Conversation
Deploy preview ready! Built with commit 721f9d2 |
@thaJeztah
And then visit I do not get served the docs. :( I get a directory listing that only seems to have the svn/wget downloads showing:
|
Hm, that's weird, let me try again
For the current version of the docs; in the old situation, Docker Cloud does a checkout of the repository at commit X. Then builds the Dockerfile, which does a checkout of the GitHub repository of commit "X + random", so there's no relation whatsoever between the commit being built from, and the docs that are being published.
Afaik, Docker Cloud / Docker Hub automated builds never use caching, so they will pull the latest version of the archives every time. |
Caching is available in Docker Cloud builds now, FWIW. :) (Recently, as of the 8th.) |
This optimizes the Dockerfile a bit by; - putting the docs archives at the top to optimize caching and to prevent having to clone the github repository on each build. Note that '--no-cache' is needed to forcefully break the cache, but the archives should not frequently change - grouping RUN lines to reduce image size. - using a loop for the archived versions to reduce the amount of duplicated code. - using the local files for the *current* version of the docs instead of the git clone from GitHub. this makes it also use the right source instead of "master" - adding a .dockerignore to prevent busting the cache if not needed, and to prevent uploading the '.git' repository, which is not used for the "current" docs Difference in size before/after; REPOSITORY TAG IMAGE ID CREATED SIZE docs latest 36f6ad029e6a 3 minutes ago 1.722 GB docs-orig latest 4f1a3e3fda4f 16 minutes ago 3.344 GB Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
5dd2c11
to
721f9d2
Compare
@sanscontext ah, thanks! In that case, we need to make sure to disable it (because there's no way to detect changes in the remove repository from the Dockerfile, so we should always assume the cache is invalid) |
@johndmulhausen I could not reproduce your issue. I tried building again, and the current docs worked for me, however the archives redirected to docs.docker.com. I changed the configuration so that building the current docs doesn't remove the archive (which is now in the output directory) |
Just adding a +1 for the approach of copying code from the build context rather than pulling master from somewhere remote. This is the standard way of building Docker images. The point of building images like this is so that you can consistently turn some code into an image with that code inside it. If you're pulling from master, you have no control over what code goes inside your image. Docker Cloud will pull the latest code, then build an image with that latest code in the build context. So, when Docker Cloud is building from master, a However, the advantage of doing |
Okay, gotcha re: using the local files vs re-cloning; this does seem a lot faster if I pull the whole PR down and run it with the docs in place alongside So how do we ensure that caching is disabled in Hub and Cloud's autobuild? If we seem sure that caching won't happen, then I'm reasonably comfortable proceeding. @sanscontext ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this approach! I forgot that you could use &&
to chain multiple commands together in a RUN
invocation. Does this have any performance benefits? What happens if one of the commands fails? I assume it short-circuits the rest of the script and the docker build
will fail.
I don't know if we document this for loop trick anywhere, but we definitely should!
Oh – even better, the Dockerfile could specify a commit to use from docker/docker, and changes can be vendored manually. That way, it'll always bust/use caches correctly, and a particular commit from docker/docker.github.io always builds exactly the same set of documentation. That's rather useful for keeping track of/rolling back deploys, and so on. |
We had the specific commit in the previous docs, so not sure that's what I think ideally we should have a separate container for each archived There is no reason to rebuild each archive version if only "master" Something like;
We can have a The We can use a simple proxy to route archive URL's to the correct
|
The reason we don't have a container per dockerfile is because everyone wanted to be able to have archive folders. Before, we were trying to route them to different containers by port. It was ugly. I don't think we want to go back to that. @johndmulhausen ? |
A few things:
Change that tag to However, using these images in production was problematic for various reasons, the most important being, we were specifically told by infra that it was not a good idea to use a proxy to forward to those container endpoints because each endpoint was the result of a stack running Ruby and Jekyll - creating quite a big RAM footprint after you stack up 8 or 9 instances (each one being about 350 MB during runtime) and there is no end in sight for new versions, so that becomes needlessly over-engineered quickly. It would be one thing if the end result of this were discrete services meant to run in parallel and we were a "real application" but we're just a static site trying to have versioned folders, which in infra's opinion isn't the best use case for a distributed-system-style, multi-container architecture. It was their recommendation, which I agreed with, that at runtime we should just have one server instance serving up a concatenated-together site that had everything in it. This also simplified the deployment/image story. This all said, I think there is probably slimming we can do of this image even more, and I would always be happy to look at someone willing to step in and implement such a solution. But for now, I think I'm ready to give this a shot and see how it goes. :) Thanks @thaJeztah @bfirsh @mstanleyjones -- let's merge and see how this goes! |
Note: The previous build time before this PR was merged was 38 minutes (!). After this was merged: 14 minutes. Definite huge improvement -- thanks so much to @thaJeztah ! |
@johndmulhausen About the memory footprint - can't the docs be served by nginx or similar in production, or does it depend on the Jenkins server for some reason? |
I was thinking the same yesterday; once the docs are built, it's just static files, so any server can be used (even a simple static server, such as https://github.com/mholt/caddy) |
That's right, any server can, though we've stuck with the GitHub Pages gem for compatibility w/GitHub Pages and simplicity (bifurcating the build and authoring image structures, just to get the memory footprint down, just to shoehorn in multi-container architecture for a static site, just to save a few minutes of deploy time -- wasn't a super-compelling use case for me). Staying within the lanes of a widely-adopted authoring package kept things easy when constructing the solution. Optimization, where the devil is in the details, is a whole other matter. I personally don't have much time for or interest in pursuing a multi-container solution for the archives, but I also always welcome contributors who want to help. If this is something that interests you, ping me privately and I'll get you set up. A big bonus to me in all this is that our build/production systems are auditable enough for more people to pitch in, as you have so wonderfully here. 👍 |
This optimizes the Dockerfile a bit by;
prevent having to clone the github repository on each build. Note
that '--no-cache' is needed to forcefully break the cache, but the
archives should not frequently change
duplicated code.
of the git clone from GitHub. this makes it also use the right source
instead of "master"
and to prevent uploading the '.git' repository, which is not used for
the "current" docs
Difference in size before/after;
/cc @johndmulhausen PTAL, I built this and I think everything works as before, but let me know if there's issues in this change