Improve performance of CI system #1407

consideRatio · 2021-07-16T21:27:28Z

We have crippled our CI systems performance after introducing support for arm64 based images. A key reason for this is that emulation of arm64 images from the amd64 based runners github provide is far worse, besides the fact that we end up building base-notebook and minimal-notebook for arm64 in sequence alongside the other images now.

I'm not fully sure how we should optimize this long run, but under the assumption that we will have high performance self-hosted arm64 based GitHub Action runners that can work in parallel to the amd64 runners. Below is an overview of a very optimized system, where several parts can be done separately.

Nightly builds
We have nightly builds with :nightly-amd64 and nightly-arm64 tags
amd64 / arm64 in parallel
All tests for amd64 and arm64 run in parallel, relying on nightly-amd64 and nightly-arm64 caches
Images in parallel where possible
All tests for individual images are run in a dedicated job that needs its base image job to complete.

Some images can run in parallel:
- base
- minimal
- scipy | r
- tensorflow | datascience | pyspark
- all-spark
Avoid rebuilds when merging
Tests finish by updating a github container registry associated with a PR. By doing so, our publishing job on merge to master can opt to use the images as they were built during tests if they are considered fresh enough.
Parallel manifest creation
Merge to default branch triggers manifest creation jobs on both amd64 and arm64. If we opt to not optimize using step 4 then this could also build fresh images using nightly cache first.
Combine manifests into one before pushing to official registry
Merge to default branch triggers a job that pulls both the amd64 image and arm64 image and defines a combined docker manifest which is then pushed to our official container registry. I think this could be done with something like docker manifest create <name of combined image> <amd64 only image> <arm64 only image> but @manics knows more and I lack experience with this.

Standalone performance issue

This standalone issue will go away by using better strategies like above. It isn't so critical to fix either I'd say. But currently, we build minimal-notebook again without using cache during push-multi assuming push-multi for base-notebook has already run. It is because we re-tag jupyter/base-notebook:latest I think.

The text was updated successfully, but these errors were encountered:

mathbunnyru · 2021-07-17T19:29:25Z

Related: #1203

mathbunnyru · 2021-07-17T19:30:09Z

Added this issue to the milestone.

mathbunnyru · 2021-07-17T19:41:25Z

Another idea is to improve PRs speed - build ARM images only in master or when the commit has some pre-defined string.
This way we will be able to have PRs working at the same speed as before (around 25 minutes).

Also, we might want to use actions/cache.

manics · 2021-07-20T11:25:08Z

Could you move the multiarch build into a separate GitHub workflow? You'd then get multiple CI statuses on PRs, and could choose to merge after the amd64 job passes instead of waiting for all jobs?

consideRatio · 2021-07-20T12:31:02Z

@manics I agree that is an important optimization - note that you only need to have separate jobs, not separate workflows (that contains X jobs). I suggest we both separate amd from arm (optimization 2) and separate images from each other (optimization 3).

consideRatio · 2021-07-22T05:15:07Z

I've ordered 7 RPi computers and look to make them self-hosted arm64 based runners for us in the Jupyter ecosystem where needed.

mathbunnyru · 2021-07-22T06:50:08Z

I've ordered 7 RPi computers and look to make them self-hosted arm64 based runners for us in the Jupyter ecosystem where needed.

Wow, nice! :)

I also wanted to create some VMs on ARM to use it as self-hosted runners, but if you're already on it, that's great 👍

mathbunnyru · 2021-08-25T11:24:02Z

@consideRatio did you have any luck with arm runners?

consideRatio · 2021-08-25T11:40:27Z

@mathbunnyru I have a k8s cluster running on 7 raspberry pi computers etc, but I've failed to deploy the github runner software on k8s still. I'm left quite clueless on what is going on with that and have failed to debug it.

actions/actions-runner-controller#732

mathbunnyru · 2021-08-25T15:34:26Z

I see. Unfortunately, I have almost zero experience with k8s and absolutely zero experience with self-hosted runners, so I can't help you right now :(

mathbunnyru · 2021-08-26T12:45:30Z

@consideRatio I noticed, that the build times in master are really slow.
It seems that for multi-arch images push step rebuilds everything from scratch and we do the same thing twice (building the image).
Could you please take a look?

Latest master branch is still running - build step took only 1h 16m 43s, push is already taking almost an hour and not yet finished.
https://github.com/jupyter/docker-stacks/runs/3431621105

consideRatio · 2021-08-30T02:16:34Z

Latest master branch is still running - build step took only 1h 16m 43s, push is already taking almost an hour and not yet finished.

Hmmm, looking into this a bit, my guess is that the cache for layers grows too large and that cause it to be discarded along the way forcing a rebuild or similar. This guess is supported by noting that previous builds have been successfully using a cache for at least the base-notebook image and then suddenly that stops working when you added more images to be built in recent PRs.

I have a few ideas on what we could do:

We could increase the available disk space for the runner.
I'm not sure about our use of --rm and the --force-rm flags when we publish and with docker buildx build.
Are they supported using docker buildx build?
If they work, doesn't that mean that we loose some relevant cache?
We could accept this failure until other performance optimizations makes this not happen, by having more and separate jobs pushing to registries in between.

Practically, doing can be done like this.

      # Without this our cache may get reset.
      #
      # NOTE: This step needs to run before actions/checkout to not end
      #       up with an empty workspace folder.
      #
      - name: Maximize build space
        uses: easimon/maximize-build-space@b4d02c14493a9653fe7af06cc89ca5298071c66e
        with:
          root-reserve-mb: 51200 # 50 GB
          build-mount-path: /var/lib/docker/tmp # remaining space
          remove-dotnet: "true"
          remove-haskell: "true"
          remove-android: "true"

To do 2, we would just experiment by removing --rm and --force-rm.

trallard · 2022-02-04T12:46:53Z

Hey folks I would like to help with optimising the CI 😉

Also - on the matter of arm64 it turns out that you can ask for permanent free-access to machines at https://github.com/WorksOnArm/equinix-metal-arm64-cluster it might be worth submitting a request

mathbunnyru · 2022-02-04T12:56:08Z

Wow, I didn't know about this project - I will submit a request in a few days, thank you!

mathbunnyru · 2022-02-04T13:25:54Z

So, I can share my vision of how to make this work.
And, to be honest I don't see another way.
My solution probably requires one big PR change at one point (though I can see how it can be done in several big steps), but it will resolve around 5 huge issues at the same time (maybe even more).

Get native arm runners. This is a must.
Without native runners, we end up building under QEMU, which means we have to wait much longer and debug some strange arm under qemu behavior.
I personally know nothing about QEMU, and I believe it makes debugging problems much more difficult.
We will also be able to build datascience-notebook for arm (it builds fine on my native arm VM, but doesn't under QEMU).
In the main workflow we create a random tag.
We can also use git hash, to make it easier to debug, though there can be several builds for the same hash.
We then create two parallel build jobs - one for arm and one for x86.
Build different platforms in parallel.
We can build our x86 images in under 30 minutes.
I think that's quite a good result.
ARM images probably take the same time.
Test these platforms in parallel.
We will be able to easily test arm images (they are not tested right now at all).
We create manifests, calculate tags, and so on using docker run in parallel.
Note, these tags are gonna be correct (we won't assume x86 tags for arm images, as we do right now).
We save these tags/manifests using github artifacts or some kind of shared location, which we can later access from the main workflow.
Remember the random tag from step 2?
We push these images in our GHCR with this tag and arch prefix in parallel.
For example, amd64 runner will push jupyer/base-notebook:amd64-randomtag, jupyer/minimal-notebook:amd64-randomtag and so on, while aarch64 runner will push jupyer/base-notebook:aarch64-randomtag, jupyer/minimal-notebook:aarch64-randomtag and so on (randomtag is the tag from the step 2).
In the main workflow we wait till these two parallel steps finish.
We try to merge the tags between arm and x86 and if the tag matches for different archs, we create multi-platform image, if not, we just push the tag.
We use docker manifest for that.
https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/
We're gonna use the hard way, because we build in different jobs and we only want to orchestrate manifests.
We do not build/run or somehow modify our images anymore.
There is room for improvement - if there was this tag for the different arch in the past, we have to do some stuff to make it work, but in the beginning, we can skip this part.
We push manifests to wiki and images to dockerhub.

Note:

we use GHCR as our buffer between native runners and the merging part.
we get rid of QEMU completely
any performance improvements can still be applied (like building different notebooks in parallel) if necessary
We will use docker manifest for merging different archs, this way our images won't be rebuilt during push (because we won't even tell docker manifest how to build the images, only perform manifests merging).
We won't need to maximize build space, because we only had problems when we started using dockerx.

manics · 2022-02-04T13:55:46Z

@mathbunnyru what does docker buildx do in step 6? If you've already built and pushed the single arch images and just need a multiarch manifest you can use docker manifest {create,push} with the tags of those images. Since this is just a metadata operation it should be even faster than a cached build.
https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/

mathbunnyru · 2022-02-04T13:59:21Z

Thanks, I meant docker run. Added docker manifest to the step 7.

mathbunnyru · 2022-02-04T14:00:17Z

@manics what do you think about my proposal?

consideRatio · 2022-02-04T17:59:48Z

Quick note from mobile: native arm runners, i got them running on k8s cluster setup with raspberry computers. The downside..... You cant use the same actions etc you have used in a github workflow. Setup-python etc relies on cached versions in a amd64 maintained github ci environment that wont work on arm64. So, going arm64 native means abandoning typical actions we have relied on.

Imo, i'm more positive on having atandalone arm64 builds that then get combined in a manifest augnentation thingy. Anyhow, on mobile, just wanted to warn about arm64 runners challenges

mathbunnyru · 2022-02-04T18:25:06Z

Thanks, @consideRatio!
Actually, the actions we will need in these runners are quite simple - install python, python requirements, make and docker.
This should be enough to build, test and push our images.

manics · 2022-02-04T23:49:27Z

Proposal generally sounds good!

In step 6 how are the manifests and tags calculated? Do they require both architectures to be built before calculation?

If the calculations in step 6 can be run in parallel for the separate architectures that avoids having to pull the images back down again, instead you could push direct to Docker Hub and save the tags/manifests as JSON or text files, uploaded as GitHub build artifacts (one set of artifacts for each architecture).

The main workflow could then fetch those artifacts, confine then as necessary, and update everything else without even touching the images.

mathbunnyru · 2022-02-05T11:37:43Z

Proposal generally sounds good!

In step 6 how are the manifests and tags calculated? Do they require both architectures to be built before calculation?

If the calculations in step 6 can be run in parallel for the separate architectures that avoids having to pull the images back down again, instead you could push direct to Docker Hub and save the tags/manifests as JSON or text files, uploaded as GitHub build artifacts (one set of artifacts for each architecture).

The main workflow could then fetch those artifacts, confine then as necessary, and update everything else without even touching the images.

I didn't want to push to docker hub directly, because we might end up in a situation, where x86 images are fine and already uploaded and arm doesn't build for some reason.
In my solution, this will fail the whole build and our docker hub images will not be changed at all.
If we push directly to dockerhub, this will leave it in some inconsistent state.

@manics I've updated my proposition to include your suggestions.
Does it make more sense now?

manics · 2022-02-06T14:42:43Z

Yes, makes sense to me 😄

trallard · 2022-02-09T10:17:36Z

Hey folks - since my brain works in a very visual way I went ahead and made a diagram which captures the proposed approach above:

From the proposal written here by @mathbunnyru: the random tag in step 2 could be a combination of the GH sha and the timedate/epoch stamp to make it easier to track back to

( You can see the high-resolution schematic here: https://res.cloudinary.com/nezahualcoyotl/image/upload/v1644401496/docker-stacks-schematic_ilb2yz.png)

As I mentioned in #1203 I would be happy to start working on a prototype to start parallelising stuff

mathbunnyru · 2022-02-09T13:23:12Z

@trallard very nice!

A few moments:

We create manifests and tags somewhere near Test images (so, in separate jobs for ARM and x86). We save this info as our build artefacts and will use that later to do merging different architectures. This can be done in parallel with testing and pushing images.
We can add "Push tags and manifests" before "Can merge tags?" questions. Or even in parallel with it. I think this step will simply gather build artefacts and prepend it to our wiki.
Build multi-platform image - I would rename this, sth like "Merge different architectures manifests to create multi-platform image". We do not build in our main job, that's important

As I mentioned in #1203 I would be happy to start working on a prototype to start parallelising stuff

Please, proceed, I won't have much time for a few months, but I'm ready to review and help if needed.

mathbunnyru · 2022-02-09T13:25:57Z

Also, I think this diagram will be very useful in the future if/when we implement this, so it might be worth to add a separate page in Contribution Guide.

trallard · 2022-02-09T16:34:55Z

For completeness, I have updated the diagram to reflect @mathbunnyru comment above

mathbunnyru · 2022-02-09T16:39:13Z

One more small update - we probably want to "Push tags and manifests" in the main workflow.
This is easier, because main workflow is gonna run in Github provided environment (so we can use existing "commit and push" GitHub workflow).
This way we will also push only when the tests are passed and they are passed for each image for each arch.

mathbunnyru · 2022-02-09T16:39:50Z

So it's a step right before "Can merge tags?"

trallard · 2022-02-09T17:16:42Z

🎉 fixed! thanks as usual for the comments/reviews

mathbunnyru · 2022-05-01T17:54:16Z

I think what we can do now, is not to create multi-platform images as well and make aarch64 tags look like this jupyter/base-notebook:aarch64-latest (just add prefix).
Users will have to use tags like this for aarch64 images, but at least our images will be reilable.

Right now, every update is a pain - I have to rebuild 5 times to get to the point, when images on DockerHub are the same, as if I built from source.

maresb · 2022-05-01T17:57:01Z

A simple-minded comment I posted under an unrelated issue, together with the response from @mathbunnyru:

Also, just an idea, sorry if this is way too naive, but would it simplify everything to make the various images into stages of a single Dockerfile? It seems like that way Docker would take care of the build dependency tree for you, so that you don't even have these problems in the first place.

This is a good suggestion. But we still have to work with amd64/aarch64 differences (for example, we're not building everything under aarch64).
Or, it doesn't make sense to build all and then test all. It makes sense to test as early as possible (otherwise you will have to wait all the builds even if base image doesn't work).
Also, tagging is not an easy thing.
Overall, I can see some advantages, but I see many disadvantages as well.

consideRatio added type:Enhancement A proposed enhancement to the docker images type:Arm Issue specific to arm architecture labels Jul 16, 2021

consideRatio mentioned this issue Jul 16, 2021

Support ARM architecture (multi-arch images) #1019

Closed

8 tasks

mathbunnyru added this to the Next Generation docker stacks milestone Jul 17, 2021

consideRatio mentioned this issue Nov 18, 2021

Latest scipy-notebook:lab-3.1.18 docker image actually contains lab 3.2.0 #1530

Closed

consideRatio mentioned this issue Dec 12, 2021

Try to use old mamba version #1545

Merged

mathbunnyru mentioned this issue Feb 5, 2022

Prototype using docker buildx bake #1604

Closed

romainx mentioned this issue Feb 6, 2022

Prototype for multi-arch manifest #1585

Closed

2 tasks

trallard mentioned this issue Feb 8, 2022

Improve monolithic build #1203

Closed

mathbunnyru pinned this issue Feb 10, 2022

mathbunnyru mentioned this issue Feb 11, 2022

Build GPU Variants of Current Images #1557

Open

This was referenced Feb 15, 2022

[WIP] - Proof of concept - optimising CI build #1627

Closed

[WIP] - Proof of concept - optimising CI build #1631

Closed

trallard mentioned this issue Mar 2, 2022

Tests of arm64 based images are not performed #1402

Closed

mathbunnyru mentioned this issue May 1, 2022

Build is stuck for jupyter/scipy-notebook arm64 image #1539

Closed

mathbunnyru mentioned this issue May 12, 2022

Implement build system using self-hosted aarch64 runners, GitHub needs jobs feature and reusable workflows #1703

Merged

8 tasks

mathbunnyru mentioned this issue Jun 2, 2022

[BUG] - Docker image jupyter/minimal-notebook:python-3.9.13 install python version 3.10 #1718

Closed

mathbunnyru closed this as completed in #1703 Jul 5, 2022

mathbunnyru unpinned this issue Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of CI system #1407

Improve performance of CI system #1407

consideRatio commented Jul 16, 2021 •

edited

Loading

mathbunnyru commented Jul 17, 2021

mathbunnyru commented Jul 17, 2021

mathbunnyru commented Jul 17, 2021

manics commented Jul 20, 2021

consideRatio commented Jul 20, 2021 •

edited

Loading

consideRatio commented Jul 22, 2021

mathbunnyru commented Jul 22, 2021

mathbunnyru commented Aug 25, 2021

consideRatio commented Aug 25, 2021

mathbunnyru commented Aug 25, 2021

mathbunnyru commented Aug 26, 2021

consideRatio commented Aug 30, 2021 •

edited

Loading

trallard commented Feb 4, 2022

mathbunnyru commented Feb 4, 2022

mathbunnyru commented Feb 4, 2022 •

edited

Loading

manics commented Feb 4, 2022

mathbunnyru commented Feb 4, 2022

mathbunnyru commented Feb 4, 2022

consideRatio commented Feb 4, 2022

mathbunnyru commented Feb 4, 2022

manics commented Feb 4, 2022

mathbunnyru commented Feb 5, 2022

manics commented Feb 6, 2022

trallard commented Feb 9, 2022 •

edited

Loading

mathbunnyru commented Feb 9, 2022

mathbunnyru commented Feb 9, 2022

trallard commented Feb 9, 2022

mathbunnyru commented Feb 9, 2022

mathbunnyru commented Feb 9, 2022

trallard commented Feb 9, 2022 •

edited

Loading

mathbunnyru commented May 1, 2022

maresb commented May 1, 2022

Improve performance of CI system #1407

Improve performance of CI system #1407

Comments

consideRatio commented Jul 16, 2021 • edited Loading

Standalone performance issue

mathbunnyru commented Jul 17, 2021

mathbunnyru commented Jul 17, 2021

mathbunnyru commented Jul 17, 2021

manics commented Jul 20, 2021

consideRatio commented Jul 20, 2021 • edited Loading

consideRatio commented Jul 22, 2021

mathbunnyru commented Jul 22, 2021

mathbunnyru commented Aug 25, 2021

consideRatio commented Aug 25, 2021

mathbunnyru commented Aug 25, 2021

mathbunnyru commented Aug 26, 2021

consideRatio commented Aug 30, 2021 • edited Loading

trallard commented Feb 4, 2022

mathbunnyru commented Feb 4, 2022

mathbunnyru commented Feb 4, 2022 • edited Loading

manics commented Feb 4, 2022

mathbunnyru commented Feb 4, 2022

mathbunnyru commented Feb 4, 2022

consideRatio commented Feb 4, 2022

mathbunnyru commented Feb 4, 2022

manics commented Feb 4, 2022

mathbunnyru commented Feb 5, 2022

manics commented Feb 6, 2022

trallard commented Feb 9, 2022 • edited Loading

mathbunnyru commented Feb 9, 2022

mathbunnyru commented Feb 9, 2022

trallard commented Feb 9, 2022

mathbunnyru commented Feb 9, 2022

mathbunnyru commented Feb 9, 2022

trallard commented Feb 9, 2022 • edited Loading

mathbunnyru commented May 1, 2022

maresb commented May 1, 2022

consideRatio commented Jul 16, 2021 •

edited

Loading

consideRatio commented Jul 20, 2021 •

edited

Loading

consideRatio commented Aug 30, 2021 •

edited

Loading

mathbunnyru commented Feb 4, 2022 •

edited

Loading

trallard commented Feb 9, 2022 •

edited

Loading

trallard commented Feb 9, 2022 •

edited

Loading