Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically build docker images with CircleCI/GitLab CI #24655

Closed
5 tasks done
saraedum opened this issue Feb 4, 2018 · 468 comments
Closed
5 tasks done

Automatically build docker images with CircleCI/GitLab CI #24655

saraedum opened this issue Feb 4, 2018 · 468 comments

Comments

@saraedum
Copy link
Member

saraedum commented Feb 4, 2018

It would be nice to update our docker images automatically through continuous integration services. Of course it's good to have these images up-to-date without manual intervention but this is also convenient as a starting point for people who want to use CI for their own branches of Sage (besides the patchbot.)¹

This ticket proposes recipes for GitLab CI and CircleCI to build our docker images automatically. On the respective websites, the CI can be configured to push automatically to the Docker Hub. A webhook (on github) updates the README on Docker Hub automatically.

I implemented this for both GitLab CI and CircleCI. I think GitLab CI is more relevant in the long run, also it's open source and people can provision their own machines as runners. CircleCI at the same time works out of the box for Sage without private test runners and it also allows for easier debugging as you can logon to the machine running your tests with SSH. I tried to share most code between the two implementations.

See also sagemath/docker-images#13 and sagemath/sage-binder-env#3 for a followup (automatically provide jupyter notebooks for easier review.)


Here are some numbers and screenshots (click on the screenshots to go to the real pages):

GitLab CI

If I provision my own runner from Google Cloud with two threads, it takes about 5 hours to build Sage from scratch, run rudimentary tests on the docker images, and upload them to Docker Hub and GitLab's registry.

<img src="gitlab.png" width=640, center, link=https:https://user-images.githubusercontent.com/373765/216875638-f961f94b-b99e-46a6-9129-fb44886fb26b.png.com/saraedum/sage/pipelines>

Recycling the build artifacts from the last run on the develop branch brings this down to about ?? minutes (on GitLab's free shared runners with two threads.) This roughly breaks down as:

  • 32 minutes for `build-from-latest:
    • 10 minutes for the actual build (most of which is spent in the docbuild; caused by a known Sphinx bug to some extent)
    • ?? minutes are spent pulling the sagemath-dev image from Docker Hub (this usually goes away if you provision your own runners and expose the host's docker daemon by setting DOCKER_HOST.)
    • a few minutes running through all the fast stages of the Dockerfile.
    • a few minutes to push the resulting images to GitLab's registry. (using GitLab's cache, this could probably be improved, at least for runners that we provision ourselves.)
  • 5 - 15 minutes for each test (run in parallel,); the relevant test is test-dev.sh which spents 6 minutes in the actual docbuild (just as in build-from-latest) and some 5 minutes to pull the sagemath-dev image from the GitLab registry. (That part should go away with a provisioned runner that exposes the host's docker daemon.)
  • ?? minutes for the publishing to Docker Hub, most of which is spent pulling the images from the GitLab registry, and the other half pushing them to Docker Hub roughly. (Again, exposing the host's docker daemon would probably cut that time in half.)

With some tricks we could probably bring this down to 25 minutes (see CircleCI below) but we won't get this down to this without giving up on the CI being split up into different stages (as is for technical reasons necessary for CircleCI.) To go well below that, we would need to pull binary packages from somewhere…I don't see a sustainable way of doing this with the current SPKG system.

<img src="gitlab-rebuild.png" width=640, center, link=https:https://user-images.githubusercontent.com/373765/216875638-f961f94b-b99e-46a6-9129-fb44886fb26b.png.com/saraedum/sage/pipelines/18026318>

CircleCI

It typically takes almost 5 hours to build Sage from scratch on CircleCI, run rudimentary tests on the docker images, and upload them to Docker Hub.

<img src="circleci.png" width=640, center, link=https:https://user-images.githubusercontent.com/373765/216875640-484f5bb4-34be-4110-b4e6-7677ed57ac53.png.com/gh/saraedum/workflows/sage>

Recycling the build artifacts from the last run on the develop branch brings this down to about 30 minutes usually. 5 minutes could be saved by not testing the sagemath-dev and probably another minute or two if we do not build it at all. To go significantly below 15 minutes is probably hard with the huge sage-the-distribution (7GB uncompressed/2GB compressed) that we have to pull every time at the moment.

<img src="circleci-rebuild.png" width=640, center, link=https:https://user-images.githubusercontent.com/373765/216875640-484f5bb4-34be-4110-b4e6-7677ed57ac53.png.com/gh/saraedum/workflows/sage>

Docker Hub

A push to github updates the README on the Docker Hub page. The current sizes are and ; unfortunately MicroBadger is somewhat unstable so these numbers are incorrectly reported as 0 sometimes.

<img src="dockerhub.png" width=640, center, link=https://hub.docker.com/r/sagemath/sagemath>


Here are some things that we need to test before merging this:


After this ticket has been merged, the following steps are necessary:

  • Setup an account for sagemath on Circle CI.
  • Add Docker Hub credentials on Circle CI or GitLab.

To see a demo of what the result looks like, go to https://hub.docker.com/r/sagemath/sagemath/. The CircleCI runs can be seen here https:https://user-images.githubusercontent.com/373765/216875640-484f5bb4-34be-4110-b4e6-7677ed57ac53.png.com/gh/saraedum/sage, and the GitLab CI runs are here https:https://user-images.githubusercontent.com/373765/216875638-f961f94b-b99e-46a6-9129-fb44886fb26b.png.com/saraedum/sage/pipelines.


¹: I want to run unit tests of an external Sage package, https://github.com/swewers/MCLF. Being able to build a custom docker image which contains some not-yet-merged tickets makes this much easier.

PS: Long-term one could imagine this to be the first step to replace the patchbot with a solution that we do not have to maintain so much ourselves, such as gitlab-runners. This is of course outside of the scope of this ticket but having a bunch of working CI files in our repository might inspire people to script some other tasks in a reproducible and standardized way.

Depends on #25161
Depends on #25160

CC: @roed314 @embray @nthiery @mkoeppe @jdemeyer @hivert

Component: distribution

Keywords: CI

Author: Julian Rüth

Branch: ac6201a

Reviewer: Erik Bray

Issue created by migration from https://trac.sagemath.org/ticket/24655

@saraedum saraedum added this to the sage-8.2 milestone Feb 4, 2018
@saraedum

This comment has been minimized.

@saraedum
Copy link
Member Author

saraedum commented Feb 4, 2018

Branch: u/saraedum/gitlabci

@saraedum
Copy link
Member Author

saraedum commented Feb 4, 2018

New commits:

91e4ed1Enablec [GitLab](../wiki/GitLab) CI and CircleCI

@saraedum

This comment has been minimized.

@saraedum
Copy link
Member Author

saraedum commented Feb 4, 2018

Commit: 91e4ed1

@nthiery
Copy link
Contributor

nthiery commented Feb 4, 2018

comment:4

I haven't played much with GitLab CI and CircleCI, so can't really comment on the details for now. But +1 for the general idea. That would be very helpful.

@nthiery
Copy link
Contributor

nthiery commented Feb 4, 2018

comment:5

Adding Matthias who did setup continuous integration for various Sage packages (including https://github.com/sagemath/sage_sample).

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 4, 2018

Changed commit from 91e4ed1 to 18970ac

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 4, 2018

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

18970acEnable [GitLab](../wiki/GitLab) CI and CircleCI

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 4, 2018

Changed commit from 18970ac to 3c85557

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 4, 2018

Branch pushed to git repo; I updated commit sha1. New commits:

e6d2703Make sure we do not docker build from a stale cache
3bbe45fDistribute sagemath-cli from GitLab
3c85557There is no guarantee that we are going to see the same docker daemon

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 16, 2018

Changed commit from 3c85557 to 5f1a0e6

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 16, 2018

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

5f1a0e6Build docker images automatically

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 17, 2018

Branch pushed to git repo; I updated commit sha1. New commits:

69752edFix script name

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 17, 2018

Changed commit from 5f1a0e6 to 69752ed

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 17, 2018

Changed commit from 69752ed to 6642d99

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 17, 2018

Branch pushed to git repo; I updated commit sha1. New commits:

c0514d5Comment on limitations on CircleCI
1035022Set workdir to the Sage directory
4ccc256Fix entrypoints
84c12edSimplify push/pull scripts
202273cA single entrypoint does not need a directory on its own
a8fcb94Add tests
cfe2d14Fix CircleCI config
6642d99Speed up my build experiments

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 18, 2018

Branch pushed to git repo; I updated commit sha1. New commits:

4e179ccno TTY/STDIN for automated tests
a5b2749Fix names of docker containers for [GitLab](../wiki/GitLab) CI
295ffd8Trying to make tests POSIX compliant
e9bacb8Fixing test for CI
1b72d5bUpdate docker/README to incorporate latest changes
062d37bAdd debug output

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 18, 2018

Changed commit from 6642d99 to 062d37b

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 26, 2018

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

4c023d5Enable CI backed by [GitLab](../wiki/GitLab) CI and CircleCI
6ccbaa3Be more agressive about requiring the latest develop
7f90d86be more precise about failed latest merge
8efe3a9more and more specific badges
15f3ae7fix "build your own" instructions
595be9cPOSIX scripts
e4010d8Fix branding of CI files

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 26, 2018

Changed commit from 062d37b to e4010d8

@saraedum

This comment has been minimized.

@saraedum
Copy link
Member Author

Attachment: gitlab.png

screenshot

@saraedum
Copy link
Member Author

screenshot

@saraedum
Copy link
Member Author

Attachment: gitlab-rebuild.png

Attachment: circleci.png

screenshot

@saraedum
Copy link
Member Author

comment:343

Replying to @embray:

So what I'm proposing is to have one build, when a tag is created, with some <version> then,

  1. Tag the image sagemath:<version>; push
  2. If the version ends with .betaN or .rcN, re-tag the image sagemath:develop; push

That's not correct, all tags should go to sagemath:develop.

  1. Else re-tag the image sagemath:latest; push

One build resulting in 2 docker-tags for the same image, each time.

I think, theoretically, the way you do things currently makes sense, and might make sense to revert to at some point, but practically-speaking, right now, I don't see why we should do build-from-clean for anything but tags.

Is it really an issue? It's a couple of hours of build time every week but it would be hardcoding knowledge about our release process which I don't like. Also, I'd like to add the release manager's integration branch to the list of build-from-clean at some point (though they should not go to the docker hub maybe) and then we'd need that one again anyway.

@saraedum
Copy link
Member Author

comment:344

Also, there are some annoying races here (the following situation has happened to me when we worked with preemptible runners a lot.) With your proposed model the following could happen:

  1. Create git-tag 9.0.
  2. The CI fails for some reason, say a timeout pushing to docker hub.
  3. Create git-tag 9.1.beta0
  4. The CI pushes to sagemath:9.1.beta0 and sagemath:develop
  5. Somebody realizes that the sagemath:9.0 build is missing and presses "retry" in the CI.
  6. The CI pushes to sagemath:9.0, sagemath:latest, and sagemath:develop. Now sagemath:develop is not the latest beta.

This can't happen if you have develop branch → develop docker-tag; git-tag → docker-tag.

@vbraun
Copy link
Member

vbraun commented Aug 25, 2018

Changed branch from u/saraedum/24655 to ac6201a

@timokau
Copy link
Contributor

timokau commented Aug 29, 2018

Changed commit from ac6201a to none

@timokau
Copy link
Contributor

timokau commented Aug 29, 2018

comment:346

Somehow 7d85dc796 ("Something related to the sphinxbuild seems to be leaking memory") causes the docbuild to segfault on nix. Since the docbuild is running on 4 cores, as far as I can see the commit shouldn't actually change anything. Any idea?

[algebras ] loading pickled environment... not yet created
[algebras ] building [inventory]: targets for 67 source files that are out of date
[algebras ] updating environment: 67 added, 0 changed, 0 removed
[algebras ] The HTML pages are in share/doc/sage/inventory/en/reference/algebras.
Build finished. The built documents can be found in /build/share/doc/sage/inventory/en/reference/algebras
[combinat ] loading pickled environment... not yet created
[combinat ] building [inventory]: targets for 367 source files that are out of date
[combinat ] updating environment: 367 added, 0 changed, 0 removed
Error building the documentation.
Traceback (most recent call last):
  File "/nix/store/9lvgqr20r7j7b6a4fmhw6n82spd9rafq-python-2.7.15-env/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/nix/store/9lvgqr20r7j7b6a4fmhw6n82spd9rafq-python-2.7.15-env/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__main__.py", line 2, in <module>
    main()
  File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__init__.py", line 1712, in main
    builder()
  File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__init__.py", line 334, in _wrapper
    getattr(get_builder(document), 'inventory')(*args, **kwds)
  File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__init__.py", line 529, in _wrapper
    build_many(build_ref_doc, L)
  File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__init__.py", line 283, in build_many
    ret = x.get(99999)
  File "/nix/store/9lvgqr20r7j7b6a4fmhw6n82spd9rafq-python-2.7.15-env/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
Exception: ('Non-exception during docbuild: Segmentation fault', SignalError('Segmentation fault',))

@saraedum
Copy link
Member Author

comment:347

I agree that it shouldn't change anything. Can you get more details on the segfault in sphinx?

@timokau
Copy link
Contributor

timokau commented Aug 29, 2018

comment:348

Replying to @saraedum:

I agree that it shouldn't change anything. Can you get more details on the segfault in sphinx?

Yet for some reason the documentation built fine on 8.2.beta1 and still builds if I revert that commit. Really weird.

I don't know, what kind of detail are you looking for? Do you happen to know how I can convince python/sphinx to give me a useful stacktrace?

@jdemeyer
Copy link

comment:349

What's the motivation for all these changes to the docbuilder?

Please see https://groups.google.com/forum/#!topic/sage-packaging/VU4h8IWGFLA

@saraedum
Copy link
Member Author

comment:350

We were running out of RAM on CI machines while building the docs IIRC.

@jdemeyer
Copy link

jdemeyer commented Nov 9, 2018

comment:351

Replying to @saraedum:

We were running out of RAM on CI machines while building the docs IIRC.

Is that related to logging somehow? See also #26667.

@jdemeyer
Copy link

jdemeyer commented Nov 9, 2018

comment:352

Seriously, yet another thing in Sage broken by this ticket...

@embray
Copy link
Contributor

embray commented Nov 9, 2018

comment:353

That's not really helpful.

@saraedum
Copy link
Member Author

saraedum commented Nov 9, 2018

comment:354

Replying to @jdemeyer:

Replying to @saraedum:

We were running out of RAM on CI machines while building the docs IIRC.

Is that related to logging somehow? See also #26667.

No, the logging issues made us exceed stdout limits on CI systems. We are now truncating the output there. It would be nice if we could try to keep the output to stdout as small as reasonably possible as this makes the CI output much easier to work with but I see that this one was too extreme.

@embray
Copy link
Contributor

embray commented Nov 9, 2018

comment:355

I don't think it was "too extreme". I think, perhaps more reasonably, there should be an option for verbosity level. My general preference is somewhere between zero output or a small status progress message until and unless something goes wrong. Detailed logs can go to files, which are more useful to have for debugging anyways (on pipelines the only inconvenience there is that I can't see the log "at-a-glance" on the web UI, but we have artifact downloads for that...)

@timokau
Copy link
Contributor

timokau commented Nov 9, 2018

comment:356

I like the idea of just logging the more verbose stuff into a file. It should be printed to stdout on failure though, otherwise debugging a CI failure is annoying.

@mkoeppe
Copy link
Contributor

mkoeppe commented Nov 9, 2018

comment:357

You probably know about this, but try "make V=0" in the sage build. Everything will go into logs only and you can print them on error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants