BuildKite jobs in Docker #4824

bkase · 2020-05-03T02:49:05Z

What

All jobs run in containers now. Builds remain quick and short-circuiting.

This is with a single serial agent running on my laptop. It will be faster with parallel agents. Overhead of checking out with git, submodules etc, cleaning untracked files, and running in docker is ~5s per "command step".

Details

So far all the jobs have minimal shell dependencies. agent.nix is extended to support both building of local shell environments and docker containers.

Why use docker from nix instead of writing a dockerfile

Created images are FROM scratch by default (the output containers are tiny )
They are layered using a fancy algorithm to minimize annoying rebuilds.
The same specification supports both a local virtualization-free environment (nix) and a containerized one (docker).
No extra dockerfile

More details

For more

The details for building and pushing is in a comment in agent.nix.

Future work

Version pinning of packages

Pick a git commit hash for nixpkgs and you're done. The nixpkgs public cache stays warm for a few months at least, and we can keep caching ourselves if we're slow to upgrade if we use cachix. I like not pinning for now as it will force us to upgrade our scripts if required if the tip of the world changes.

Collapsing docker image rebuilding steps into a simple script invocation

This wouldn't take more than an hour or so, but Brandon doesn't want to spend time on that if we're eventually moving to a bread-and-butter dockerfile.

Pinning images with git sha tags

This can be fixed by adding one more file to our make update-deps job, namely something Dhall can read in. Then we can optionally apply this tag with a new field in Docker.Config.Type.

Performance tricks

A few extra steps were taken to increase build performance:

Git diff from a bind-mounted volume is ridiculously slow on my machine (50x performance penalty) -- to fix I pushed the diff computation to a hook that runs post-checkout instead of as a step inside the container.
Dhall's cache has to be mounted in the volume -- once we have artifact up/down (in BuildKite CI: Deployed buildkite agents can download and upload artifacts to gcloud #4804 ) we should swap to that. Since auto-volume creation makes it difficult to specify uid/gid of the mounted directory, the temporary hack is to chmod 777 the cache as root in the post-checkout hook (before the docker container is run with lowered permissions)

bkase · 2020-05-05T04:22:21Z

This comment contains a record of what used to be in the description of the PR.

Details

Containers

So far all the jobs have minimal shell dependencies. The following obvious mechanisms to describe these containers seemed wrong to me: (1) Make a single Dockerfile and stick the union of all the deps in here. This seems wasteful for jobs that only need a few specific tools (2) Write Dockerfiles for each job. This seems annoying and tedious.

Nixery seemed like an easy way in the short term to specify exactly what I needed in each job. In this draft PR, I ran a local instance (in a container) pointed to nixpkgs-unstable.

We could host an instance of this on Gcloud, but Nixery is probably too obscure of a thing to depend on for critical infrastructure.
I'm very happy to rip this part out and replace it with "the right" way to describe these containers before we land this PR.

Performance tricks

A few extra steps were taken to increase build performance:

Git diff from a bind-mounted volume is ridiculously slow on my machine (50x performance penalty) -- to fix I pushed the diff computation to a hook that runs post-checkout instead of as a step inside the container.
Dhall's cache has to be bind-mounted in the volume -- once we have artifact up/down (in BuildKite CI: Deployed buildkite agents can download and upload artifacts to gcloud #4804 ) we should swap to that.

Unresolved Questions

Dhall's cache is currently bind-mounted from a hardcoded path on my machine (Docker volume mounting demands an absolute path), I don't want to block this PR on BuildKite CI: Deployed buildkite agents can download and upload artifacts to gcloud #4804 being implemented, but unsure what a reasonable temporary measure should be
Docker containers are currently declared as nixery containers relative to a local nixery server. This needs to be replaced with "the right" way to describe the containers. What is this thing?

nholland94 · 2020-05-05T18:35:16Z

.buildkite/hooks/post-checkout

+./.buildkite/scripts/generate-diff.sh > $computed_diff.txt
+
+#
+# TODO: Remove this when we figure out where to stick this cache long term


You can pre-bake the cache into the container. All you need to do is just do dhall <<< '<external_lib_wrapper_path>' for each external library and the cache will be there for you. You will just need to re-build the container when you update dhall deps, which should be very, very infrequent. We can add a update_deps style step for enforcing that if we need to.

oops, you're right. This is a simple way to do it

nholland94 · 2020-05-05T18:39:25Z

.buildkite/src/Lib/Command.dhall

-      if Prelude.List.null (Map.Entry Text Text) agents then None (Map.Type Text Text) else Some agents
+      if Prelude.List.null (Map.Entry Text Text) agents then None (Map.Type Text Text) else Some agents,
+    plugins =
+      [ { mapKey = "docker#v3.5.0"


toMap { `docker#v3.5.0` = Docker.build c.docker }

nholland94 · 2020-05-05T18:40:56Z

.buildkite/src/Monorepo.dhall

      label = "Monorepo triage",
      key = "cmds",
-      target = Size.Small
+      target = Size.Small,
+      docker = Docker.Config::{ image = "codaprotocol/ci-toolchain-base" }


This should probably have a version tag so that it doesn't just pull latest (which is bad since we may have different branches with different image versions for their CI). Also, can you configure this value as a constant coming from some file? For instance, (./Constants/ContainerImages.dhall).toolchainBase. That way it's easy to update it later.

bkase force-pushed the feature/buildkite-jobs-in-docker branch from 7cf1a2e to 5a37b37 Compare May 4, 2020 22:39

bkase marked this pull request as ready for review May 5, 2020 04:44

bkase requested a review from a team as a code owner May 5, 2020 04:44

nholland94 reviewed May 5, 2020

View reviewed changes

bkase and others added 17 commits May 7, 2020 23:18

Agents with dockers

5fa9946

Opts in for buildkite-agent-access-token passing

f0e42f8

Depends on grep

e04c689

Move git fetch origin to post-checkout; pre docker

5c90829

Includes all tools in containers

3673467

Bind mounts dhall caches

6213b7e

Needs findutils for xargs

2ae997a

Absolute mount path

678dbed

Refers to home dir explicitly

f8f5ba6

Hardcode /Users/bkase

eb8986d

Create computed diff in post-checkout hook

706b99b

Uses a named volume instead of mounting a bkase path

8d2f8de

Generates docker image from agent nix spec

fbfb2ab

Removes rogue comma

299a8fb

Adds cacert package and clarifies usage

de99a04

Explicit rw mount of dhall cache

cdf2580

Accesses volume to fix perms in post-checkout

28eaff9

bkase force-pushed the feature/buildkite-jobs-in-docker branch from e2bd11f to 28eaff9 Compare May 8, 2020 03:19

bkase requested review from a team as code owners May 8, 2020 03:19

bkase added 4 commits May 8, 2020 01:02

Scraps volumes, puts dhall cache into docker

3b5dbfa

Fixes permissions on /.cache

f89cf0f

Moves dhall cache dir before build steps

12b3cd1

Boolean flip by accident

9337108

bkase added 2 commits May 8, 2020 02:46

Tries no uid/gid

85c5f01

Uses toMap everywhere

abe82cb

bkase mentioned this pull request May 8, 2020

BuildKite CI: Push, tag, update references automatically for ci-toolchain-base #4862

Closed

bkase added 2 commits May 8, 2020 03:41

Makes docker image a constant (and tagged)

5500680

Typo on size value

b6a2b40

bkase changed the base branch from feature/buildkite-auto-jobs to develop May 8, 2020 07:46

Removes leftover docker invocation for volumes

ddf7707

nholland94 approved these changes May 8, 2020

View reviewed changes

Merge branch 'develop' into feature/buildkite-jobs-in-docker

cdc3186

bkase added the ready-to-merge Adding this label will trigger mergify and trigger CI to run and merge the PR label May 8, 2020

mergify bot merged commit 178fa42 into develop May 8, 2020

mergify bot deleted the feature/buildkite-jobs-in-docker branch May 8, 2020 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BuildKite jobs in Docker #4824

BuildKite jobs in Docker #4824

bkase commented May 3, 2020 •

edited

Loading

bkase commented May 5, 2020

nholland94 May 5, 2020

bkase May 5, 2020

nholland94 May 5, 2020

nholland94 May 5, 2020

BuildKite jobs in Docker #4824

BuildKite jobs in Docker #4824

Conversation

bkase commented May 3, 2020 • edited Loading

What

Details

Why use docker from nix instead of writing a dockerfile

More details

For more

Future work

Performance tricks

bkase commented May 5, 2020

Details

Containers

Performance tricks

Unresolved Questions

nholland94 May 5, 2020

Choose a reason for hiding this comment

bkase May 5, 2020

Choose a reason for hiding this comment

nholland94 May 5, 2020

Choose a reason for hiding this comment

nholland94 May 5, 2020

Choose a reason for hiding this comment

bkase commented May 3, 2020 •

edited

Loading