Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BuildKite jobs in Docker #4824

Merged
merged 27 commits into from
May 8, 2020
Merged

BuildKite jobs in Docker #4824

merged 27 commits into from
May 8, 2020

Conversation

bkase
Copy link
Member

@bkase bkase commented May 3, 2020

What

All jobs run in containers now. Builds remain quick and short-circuiting.

Screen Shot 2020-05-02 at 10 36 21 PM

This is with a single serial agent running on my laptop. It will be faster with parallel agents. Overhead of checking out with git, submodules etc, cleaning untracked files, and running in docker is ~5s per "command step".

Details

So far all the jobs have minimal shell dependencies. agent.nix is extended to support both building of local shell environments and docker containers.

Why use docker from nix instead of writing a dockerfile

  1. Created images are FROM scratch by default (the output containers are tiny )
  2. They are layered using a fancy algorithm to minimize annoying rebuilds.
  3. The same specification supports both a local virtualization-free environment (nix) and a containerized one (docker).
  4. No extra dockerfile

More details

For more

The details for building and pushing is in a comment in agent.nix.

Future work

  • Version pinning of packages

Pick a git commit hash for nixpkgs and you're done. The nixpkgs public cache stays warm for a few months at least, and we can keep caching ourselves if we're slow to upgrade if we use cachix. I like not pinning for now as it will force us to upgrade our scripts if required if the tip of the world changes.

  • Collapsing docker image rebuilding steps into a simple script invocation

This wouldn't take more than an hour or so, but Brandon doesn't want to spend time on that if we're eventually moving to a bread-and-butter dockerfile.

  • Pinning images with git sha tags

This can be fixed by adding one more file to our make update-deps job, namely something Dhall can read in. Then we can optionally apply this tag with a new field in Docker.Config.Type.

Performance tricks

A few extra steps were taken to increase build performance:

  1. Git diff from a bind-mounted volume is ridiculously slow on my machine (50x performance penalty) -- to fix I pushed the diff computation to a hook that runs post-checkout instead of as a step inside the container.
  2. Dhall's cache has to be mounted in the volume -- once we have artifact up/down (in BuildKite CI: Deployed buildkite agents can download and upload artifacts to gcloud #4804 ) we should swap to that. Since auto-volume creation makes it difficult to specify uid/gid of the mounted directory, the temporary hack is to chmod 777 the cache as root in the post-checkout hook (before the docker container is run with lowered permissions)

@bkase bkase force-pushed the feature/buildkite-jobs-in-docker branch from 7cf1a2e to 5a37b37 Compare May 4, 2020 22:39
@bkase
Copy link
Member Author

bkase commented May 5, 2020

This comment contains a record of what used to be in the description of the PR.

Details

Containers

So far all the jobs have minimal shell dependencies. The following obvious mechanisms to describe these containers seemed wrong to me: (1) Make a single Dockerfile and stick the union of all the deps in here. This seems wasteful for jobs that only need a few specific tools (2) Write Dockerfiles for each job. This seems annoying and tedious.

Nixery seemed like an easy way in the short term to specify exactly what I needed in each job. In this draft PR, I ran a local instance (in a container) pointed to nixpkgs-unstable.

We could host an instance of this on Gcloud, but Nixery is probably too obscure of a thing to depend on for critical infrastructure.
I'm very happy to rip this part out and replace it with "the right" way to describe these containers before we land this PR.

Performance tricks

A few extra steps were taken to increase build performance:

  1. Git diff from a bind-mounted volume is ridiculously slow on my machine (50x performance penalty) -- to fix I pushed the diff computation to a hook that runs post-checkout instead of as a step inside the container.
  2. Dhall's cache has to be bind-mounted in the volume -- once we have artifact up/down (in BuildKite CI: Deployed buildkite agents can download and upload artifacts to gcloud #4804 ) we should swap to that.

Unresolved Questions

  1. Dhall's cache is currently bind-mounted from a hardcoded path on my machine (Docker volume mounting demands an absolute path), I don't want to block this PR on BuildKite CI: Deployed buildkite agents can download and upload artifacts to gcloud #4804 being implemented, but unsure what a reasonable temporary measure should be
  2. Docker containers are currently declared as nixery containers relative to a local nixery server. This needs to be replaced with "the right" way to describe the containers. What is this thing?

@bkase bkase marked this pull request as ready for review May 5, 2020 04:44
@bkase bkase requested a review from a team as a code owner May 5, 2020 04:44
./.buildkite/scripts/generate-diff.sh > $computed_diff.txt

#
# TODO: Remove this when we figure out where to stick this cache long term
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pre-bake the cache into the container. All you need to do is just do dhall <<< '<external_lib_wrapper_path>' for each external library and the cache will be there for you. You will just need to re-build the container when you update dhall deps, which should be very, very infrequent. We can add a update_deps style step for enforcing that if we need to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, you're right. This is a simple way to do it

if Prelude.List.null (Map.Entry Text Text) agents then None (Map.Type Text Text) else Some agents
if Prelude.List.null (Map.Entry Text Text) agents then None (Map.Type Text Text) else Some agents,
plugins =
[ { mapKey = "docker#v3.5.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toMap { `docker#v3.5.0` = Docker.build c.docker }

label = "Monorepo triage",
key = "cmds",
target = Size.Small
target = Size.Small,
docker = Docker.Config::{ image = "codaprotocol/ci-toolchain-base" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably have a version tag so that it doesn't just pull latest (which is bad since we may have different branches with different image versions for their CI). Also, can you configure this value as a constant coming from some file? For instance, (./Constants/ContainerImages.dhall).toolchainBase. That way it's easy to update it later.

@bkase bkase force-pushed the feature/buildkite-jobs-in-docker branch from e2bd11f to 28eaff9 Compare May 8, 2020 03:19
@bkase bkase requested review from a team as code owners May 8, 2020 03:19
@bkase bkase changed the base branch from feature/buildkite-auto-jobs to develop May 8, 2020 07:46
@bkase bkase added the ready-to-merge Adding this label will trigger mergify and trigger CI to run and merge the PR label May 8, 2020
@mergify mergify bot merged commit 178fa42 into develop May 8, 2020
@mergify mergify bot deleted the feature/buildkite-jobs-in-docker branch May 8, 2020 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge Adding this label will trigger mergify and trigger CI to run and merge the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants