Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preliminary steps to save the CI infrastructure #39009

Open
wants to merge 48 commits into
base: develop
Choose a base branch
from

Conversation

kwankyu
Copy link
Collaborator

@kwankyu kwankyu commented Nov 20, 2024

To improve the situation with the CI infrastructure, this PR:

  • added comments untangling obscure code in CI-related files, for those poor guys who ever attempt to read the files for whatever reasons.

  • while doing the cosmetic changes, a bug (about -uninstall targets) was found build/make/Makefile.in, which is fixed here.

    to test, do

    $ ./configure --enable-dot2tex | grep dot2tex
    $ make build | grep dot2tex
    $ ./configure --disable-dot2tex | grep dot2tex
    $ make build | grep dot2tex
    
  • fixed some jobs in the CI-linux workflow that fail because of duplicate artifact names.

  • removed ubuntu-lunar, ubuntu-mantic, conda-forge-python3.11, ubuntu-bionic-gcc_8-i386, debian-bullseye-i386 from the list of the default systems that CI runs for. This is how to properly modify the list:

    • first edit tox.ini (find DEFAULT_SYSTEM_FACTORS)
    • run tox -e update_docker_platforms
    • commit the changes
  • removed old versions of linuxmint and added new versions.

  • removed old versions of fedora distributions.

  • "optional" and "experimental" jobs now run upon "standard" docker images, instead of "maximal" ones, to avoid "out of runner space" error.

  • renamed "Reusable workflow for Docker-based portability CI" to "Workflow for Linux portability CI" for short name and made it runnable through github interface to facilitate testing specific platform by adding "workflow-dispatch" calling docker.yml.

    test: https://github.com/kwankyu/sage/actions/workflows/docker.yml

  • added helpful comments and updated the developer doc

  • reimplemented .ci/write-dockerfile.sh so that simplified Dockerfile is generated for present and future stability

  • turned off failing jobs in "CI Linux incremental"

  • removed seemingly useless subprojects/factory directory to eliminate certain git warnings.

  • turned off "standard-sitepackegs" and "standard-constraints_pkgs-norequirements" jobs as they fail on (almost) all platforms.

test CI run (as of 10.6.beta8): https://github.com/kwankyu/sage/actions/runs/13676372856
compare with the status quo: https://github.com/sagemath/sage/actions/runs/13596179432

test CI with a PR: kwankyu#82

The main objective of this PR is to solve issues with the workflow "CI Linux" such that a failure on a platform reveals solely some problem of sage built on the platform, but not a problem of the CI infrastructure. After this PR, hopefully, each of failing platforms should be tackled individually. If a platform fails, perhaps we should

  1. decide first whether to support the platform or not.
  2. if the platform is supported, open a github issue for it.
  3. if the platform is not supported, then remove it from the "master list of supported linux platforms" in tox.ini.
  4. if a supported platform constantly fails but no PR for the issue is present, then we may turn it off (by commenting it out) until fixed.

I suggest discontinuing support (at least in CI) for Linux releases that have been past their EOL (end of life or end of support by the distributor) for more than 2 years.

Only decent platforms according to the CI results should be listed in https://github.com/sagemath/sage/wiki/Sage-10.6-Release-Tour#availability-and-installation-help.

The following diagram shows how packages are installed for each of CI jobs:

                _prereq | standard package | optional package | experimental package
-------------------------------------------------------------------------------------
"minimal"        SSSSSS | ---------------- |
"standard"       SSSSSS | SSSSSSSSSSS----- |
"maximal"        SSSSSS | SSSSSSSSSSS----- | SSSS------------ | 
"optional"       SSSSSS | SSSSSSSSSSS----- | ---------------- | 
"experimental"   SSSSSS | SSSSSSSSSSS----- |                  | ------------------ |

where "S" represents system package and dash "-" represents Sage package. Hence

  • In the test results of "minimal" job, we can examine how well standard sage packages behave with sage.
  • In the test results of "standard" job, we can examine how well standard system packages behave with sage.
  • In the test results of "maximal" job, we can examine how well optional system packages behave with sage.
  • In the test results of "optional" job, we can examine how well optional sage packages behave with sage.
  • In the test results of "experimental" job, we can examine how well experimental sage packages behave with sage.

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

@kwankyu kwankyu changed the title Add comments untangling complicated code Add comments untangling obscure code in a few build-related files Nov 20, 2024
Copy link

github-actions bot commented Nov 20, 2024

Documentation preview for this PR (built with commit 527a5ae; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@kwankyu kwankyu force-pushed the p/add-comments-to-scripts branch from ff6bdcd to 7da5efe Compare November 20, 2024 13:47
@kwankyu kwankyu changed the title Add comments untangling obscure code in a few build-related files Add comments untangling obscure code in a few CI-related files Nov 21, 2024
@kwankyu kwankyu force-pushed the p/add-comments-to-scripts branch from db666ee to 6df7a6a Compare February 13, 2025 08:35
@kwankyu kwankyu changed the title Add comments untangling obscure code in a few CI-related files Add comments untangling obscure code in CI-related files Feb 13, 2025
@kwankyu kwankyu changed the title Add comments untangling obscure code in CI-related files Preliminary steps to save the CI infrastructure Feb 13, 2025
@kwankyu kwankyu marked this pull request as ready for review February 13, 2025 10:12
@kwankyu kwankyu mentioned this pull request Feb 13, 2025
5 tasks
@dimpase dimpase added the disputed PR is waiting for community vote, see https://groups.google.com/g/sage-devel/c/IgBYUJl33SQ label Feb 13, 2025
@dimpase
Copy link
Member

dimpase commented Feb 13, 2025

before proceeding it's good to decide whether we rather go with the other PR

@kwankyu kwankyu marked this pull request as draft February 15, 2025 00:39
@kwankyu kwankyu force-pushed the p/add-comments-to-scripts branch from d63c7e7 to a83f7c3 Compare March 3, 2025 08:32
@tobiasdiez
Copy link
Contributor

... Previously, maximum-pre was only building the image with the system packages

Right. There was an error in the diagram in #39009 (comment):

Corrected diagram:

                    _prereq | standard package | optional package | experimental package
-------------------------------------------------------------------------------------
"minimal"            SSSSSS | ---------------- |
"standard"           SSSSSS | SSSSSSSSSSS----- |
"maximal-pre"        SSSSSS | SSSSSSSSSSS      | SSSS             | 
old "optional"       SSSSSS | SSSSSSSSSSS----- | ~~~~------------ | 
old "experimental"   SSSSSS | SSSSSSSSSSS----- | SSSS             | ------------------ |

where "~" means "S" + "-".

Do you agree?

I don't think it is doing ~ in "optional". It should be just "SSSS" there as well. (Of course it builds the optional package if the system one is not suitable.)

For comparison, this is what the CI from this PR does:

                _prereq | standard package | optional package | experimental package
-------------------------------------------------------------------------------------
"minimal"        SSSSSS | ---------------- |
"standard"       SSSSSS | SSSSSSSSSSS----- |
"maximal"        SSSSSS | SSSSSSSSSSS----- | SSSS------------ | 
"optional"       SSSSSS | SSSSSSSSSSS----- | ---------------- | 
"experimental"   SSSSSS | SSSSSSSSSSS----- |                  | ------------------ |

With your changes, maximum now corresponds to the old optional.

"correspond" yes. But not exactly the same because equivalent system packages are not in the image. So the image of the old "optional" would be bigger than the image of the new "optional".

... make experimental last even longer.

No. The old "experimental" installs more packages, and its image is bigger.

"bigger image" implies longer build time and higher likelihood of "out of runner space" error.

why you think it's a good idea to expand the CI tests to cover the case where all optional packages are installed via sage-the-distro?

The old "optional" already does that, and the case is useful to test optional sage packages. No?

I agree with you that your diagram is the most complete and most logical one; I just very much doubt that we have the resources to have all these test pass. For this reason, I would start with "standard" and the old "optional", and once they are green most of the times and someone still has resources we can activate "minimal" and your "optional".

Side note: This is a good discussion. No one is always right. That is why we discuss, to learn from each other. Just do not suspect that fellow developers are not as serious as you are.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 3, 2025

The old "optional" already does that, and the case is useful to test optional sage packages. No?

I agree with you that your diagram is the most complete and most logical one; I just very much doubt that we have the resources to have all these test pass. For this reason, I would start with "standard" and the old "optional", and once they are green most of the times and someone still has resources we can activate "minimal" and your "optional".

In my view, "minimal" is the most important job because in it, all standard sage packages are built. How can they be tested on a platform without actually installing them on the platform?

A reasonable way to proceed is this (sorry for repeating):

  1. merge this PR, which fixes the broken CI infrastructure.
  2. observe how it works with the next release.
  3. if we are out of resources to run all CI jobs with respect to time and electricity, then your PR solves the problem.

Some reasonable ways (I may accept) to reduce resource usage include:

  1. reduce the list of the default tested platforms (please, through tox -e update_docker_platforms).
  2. turn off "experimental" job if we don't have enough resources (please, do not remove).

Putting more packages into _prereq may also help. But such change is beyond the scope of this PR and perhaps your PR as well.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 3, 2025

I don't think it is doing ~ in "optional". It should be just "SSSS" there as well. (Of course it builds the optional package if the system one is not suitable.)

No, it does not work as you think. In "maximal pre", all system packages corresponding to optional packages are installed. For example, find

bliss                                     x86_64 0.77-9.fc41                        fedora                            24.6 KiB

in https://github.com/sagemath/sage/actions/runs/13596179432/job/38030950449, and in old "optional", find

make --no-print-directory bliss-SAGE_LOCAL-no-deps
[bliss-0.77] installing. Log file: /sage/logs/pkgs/bliss-0.77.log
  [bliss-0.77] successfully installed.

in https://github.com/sagemath/sage/actions/runs/13596179432/job/38036933209.

"bliss" is an optional package.

@dimpase
Copy link
Member

dimpase commented Mar 3, 2025

there are plenty of packages in standard which are never built on most platforms. If you build gcc on a platform where another version of gcc is always used, your test is meaningless and might be misleading.
Or you fail to build gcc, and this gives you a reason to say that the platform is broken - whereas its native gcc is good enough to build Sage.

when we at last adjust _prereqs to meaningful contents, it might start making sense to talk about running "minimal" tests - although they won't be much different from standard then.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 4, 2025

when we at last adjust _prereqs to meaningful contents, it might start making sense to talk about running "minimal" tests

Please separate the issues for constructive discussion.

This PR fixes the CI so that "switches and knobs" work for the components of the CI machine.

We may discuss in the other PR about which switches to turn on and off, considering resources ("time" and "electricity") and utility. The situation (about resources and utility) may change in time.

I created this PR because removing those "switches and knobs" is not a proper way to maintain the CI machine.

@user202729
Copy link
Contributor

I don't think it is doing ~ in "optional". It should be just "SSSS" there as well. (Of course it builds the optional package if the system one is not suitable.)

No, it does not work as you think. In "maximal pre", all system packages corresponding to optional packages are installed. For example, find

bliss                                     x86_64 0.77-9.fc41                        fedora                            24.6 KiB

in https://github.com/sagemath/sage/actions/runs/13596179432/job/38030950449, and in old "optional", find

make --no-print-directory bliss-SAGE_LOCAL-no-deps
[bliss-0.77] installing. Log file: /sage/logs/pkgs/bliss-0.77.log
  [bliss-0.77] successfully installed.

in https://github.com/sagemath/sage/actions/runs/13596179432/job/38036933209.

"bliss" is an optional package.

Is it? Then what does that part mean?

2025-03-01T14:31:07.6437098Z #21 [configured 2/2] RUN ./configure  --enable-option-checking --with-system-4ti2=yes --with-system-bliss=yes --with-system-boost_cropped=yes --with-system-brial=yes --with-system-bzip2=yes --with-system-cbc=yes --with-system-cddlib=yes --with-system-cliquer=yes --with-system-cmake=yes --with-system-coxeter3=yes --with-system-curl=yes --with-system-ecl=yes --with-system-eclib=yes --with-system-ecm=yes --with-system-fflas_ffpack=yes --with-system-ffmpeg=yes --with-system-flint=yes --with-system-fplll=yes --with-system-free_fonts=yes --with-system-gap=yes --with-system-gc=yes --with-system-gcc=yes --with-system-gengetopt=yes --with-system-gf2x=yes --with-system-gfan=yes --with-system-gfortran=yes --with-system-giac=yes --with-system-git=yes --with-system-givaro=yes --with-system-glpk=yes --with-system-gmp=yes --with-system-gp2c=yes --with-system-graphviz=yes --with-system-gsl=yes --with-system-igraph=yes --with-system-imagemagick=yes --with-system-iml=yes --with-system-info=yes --with-system-isl=yes --with-system-lcalc=yes --with-system-libatomic_ops=yes --with-system-libbraiding=yes --with-system-libffi=yes --with-system-libgd=yes --with-system-libgraphviz=yes --with-system-libhomfly=yes --with-system-libjpeg=yes --with-system-liblzma=yes --with-system-libnauty=yes --with-system-libsemigroups=yes --with-system-libxml2=yes --with-system-linbox=yes --with-system-llvm=yes --with-system-lrcalc=yes --with-system-lrslib=yes --with-system-m4ri=yes --with-system-m4rie=yes --with-system-mathjax=yes --with-system-maxima=yes --with-system-meson=yes --with-system-mpc=yes --with-system-mpfr=yes --with-system-nauty=yes --with-system-ncurses=yes --with-system-ninja_build=yes --with-system-ntl=yes --with-system-onetbb=yes --with-system-openblas=yes --with-system-openssl=yes --with-system-palp=yes --with-system-pandoc=yes --with-system-pari=yes --with-system-pari_galdata=yes --with-system-pari_galpol=yes --with-system-pari_seadata=yes --with-system-patch=yes --with-system-patchelf=yes --with-system-pdf2svg=yes --with-system-perl_cpan_polymake_prereq=yes --with-system-perl_mongodb=yes --with-system-perl_term_readline_gnu=yes --with-system-pkgconf=yes --with-system-planarity=yes --with-system-polymake=yes --with-system-ppl=yes --with-system-primecount=yes --with-system-primesieve=yes --with-system-python3=yes --with-system-qhull=yes --with-system-readline=yes --with-system-rw=yes --with-system-sbcl=yes --with-system-singular=yes --with-system-sqlite=yes --with-system-suitesparse=yes --with-system-symmetrica=yes --with-system-sympow=yes --with-system-tachyon=yes --with-system-texlive=yes --with-system-texlive_luatex=yes --with-system-tox=yes --with-system-virtualenv=yes --with-system-xindy=yes --with-system-xz=yes --with-system-zeromq=yes --with-system-zlib=yes --with-system-git=yes --enable-experimental-packages --enable-download-from-upstream-url --enable-build-as-root  --with-system-python3=yes  --enable-fat-binary || (echo "::group::config.log"; cat config.log; echo "::endgroup::"; exit 1)

2025-03-01T14:31:11.2647684Z -----------------------------------------------------------------------------
2025-03-01T14:31:11.2648184Z Checking whether SageMath should install SPKG bliss...
2025-03-01T14:31:11.2648759Z checking for bliss/bliss_C.h... yes
2025-03-01T14:31:11.2649066Z checking for library containing bliss_new... -lbliss
2025-03-01T14:31:11.3652638Z checking checking bliss version directly... Good.
2025-03-01T14:31:11.3655076Z configure: will use system package and not install SPKG bliss
2025-03-01T14:31:11.3657204Z -----------------------------------------------------------------------------

2025-03-01T14:31:32.6696818Z bliss:                          using system package; SPKG will not be installed

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 4, 2025

That shows the decision of ./configure. However later in the build process, make --no-print-directory bliss-SAGE_LOCAL-no-deps installs the sage package bliss, regardless of that decision.

@dimpase
Copy link
Member

dimpase commented Mar 4, 2025

That shows the decision of ./configure. However later in the build process, make --no-print-directory bliss-SAGE_LOCAL-no-deps installs the sage package bliss, regardless of that decision.

and then you never really know which bliss is used in this test.

if you must test bliss as installed by sage, then you must make sure there is no other bliss around.

Note that even if configure rejected system bliss, it would still possible for some other optional package to use the system bliss, in principle.
E.g. bliss is used in gap_packages.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 4, 2025

That shows the decision of ./configure. However later in the build process, make --no-print-directory bliss-SAGE_LOCAL-no-deps installs the sage package bliss, regardless of that decision.

and then you never really know which bliss is used in this test.

Note that this is the situation of old "optional" based on "maximal-pre".

if you must test bliss as installed by sage, then you must make sure there is no other bliss around.

Then the new "optional" based on "standard" is even better in that regard since there is no other bliss around.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 5, 2025

I removed some linux releases (ubuntu and fedora releases) from CI, according to

  • discontinue support in CI for Linux releases that have been past their EOL (end of life or end of support by the distributor) for more than 2 years.

I suggest that as a guideline to decide which Linux releases we should run tests for in CI. I don't know if there is already a similar guideline in our documentation (or in sage-devel).

@tobiasdiez
Copy link
Contributor

The old "optional" already does that, and the case is useful to test optional sage packages. No?

I agree with you that your diagram is the most complete and most logical one; I just very much doubt that we have the resources to have all these test pass. For this reason, I would start with "standard" and the old "optional", and once they are green most of the times and someone still has resources we can activate "minimal" and your "optional".

In my view, "minimal" is the most important job because in it, all standard sage packages are built. How can they be tested on a platform without actually installing them on the platform?

I don't think we necessarily need to test this. In case of errors during the compilation of sage packages, people seem to be pragmatic and recommend to install the system package (latest example).

I removed some linux releases (ubuntu and fedora releases) from CI, according to

* discontinue support in CI for Linux releases that have been past their EOL (end of life or end of support by the distributor) for more than 2 years.

I suggest that as a guideline to decide which Linux releases we should run tests for in CI. I don't know if there is already a similar guideline in our documentation (or in sage-devel).

I agree we should not test systems past their EOL (but this should not be the only criteria for dropping support).

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 5, 2025

In my view, "minimal" is the most important job because in it, all standard sage packages are built. How can they be tested on a platform without actually installing them on the platform?

I don't think we necessarily need to test this.

We are testing sage packages on multiple levels:

  • the author of a PR modifying the sage package on his own machine
  • the PR branch is tested incrementally on selected machines (CI incremental)
  • the release to which the PR branch is merged is tested on buildbots run by the release manager
  • the release is tested on CI Linux and CI macOS on wider selection of machines, to give info to developers

for the best stability of the release, when built on the user's machine.

In case of errors during the compilation of sage packages, people seem to be pragmatic and recommend to install the system package (latest example).

This is a failure of the multiple-level testing, since the testing is not perfect. Reducing such instances of build failure is exactly the purpose of our CI infrastructure. That helps people, including Dima, live better life.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 5, 2025

Do we still support python 3.9? Where is documented the oldest python we support?

We may remove more platforms if we do not support python 3.9...

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 6, 2025

Do we still support python 3.9? Where is documented the oldest python we support?

We still support python 3.9 according to https://doc-release--sagemath.netlify.app/html/en/reference/spkg/python3

@user202729
Copy link
Contributor

user202729 commented Mar 6, 2025

I don't think we necessarily need to test this.

In case of errors during the compilation of sage packages, people seem to be pragmatic and recommend to install the system package (latest example).

Reducing such instances of build failure is exactly the purpose of our CI infrastructure […]

I think we went through this argument: yes, the CI did its job (with occasional fix needed such as this one?), but we don't have enough resources to fix failures anyway and/or people who could fix it doesn't see the necessity because workaround is available.

(Your argument against it, people could fix it doesn't because testing is not perfect instead, yes it's true that testing is not perfect, but the implication wouldn't hold if people getting build issues still are redirected to install system package even before CI failing.)

Yes, I also agree that it's better to disable than delete the code so (hypothetically) if we get more resources in the future it can simply be reenabled.

We still support python 3.9 according to https://doc-release--sagemath.netlify.app/html/en/reference/spkg/python3

has been dropped since #39251 (unfortunately there isn't enough links between relevant parts in source code/documentation so people can forget to update one when another is updated)

@dimpase
Copy link
Member

dimpase commented Mar 6, 2025

failures anyway and/or people who could fix it doesn't see the necessity because workaround is available.

Exactly. Each vendored package is more work down the road, not less.
E.g. I couldn't care less if we have a failure in some jupyter test on a particular platform.
Because pip install notebook is more robust than our lame efforts to vendor hundreds of packages it needs.
Why do we have these hundreds in Sage in the 1st place? Inertia, this is why. No other reason.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 6, 2025

... but we don't have enough resources to fix failures anyway and/or people who could fix it doesn't see the necessity because workaround is available.

I think you mean people or developers by "resources".

We cannot say definitely "people will fix it", "people won't fix it", "we have enough resource", or "we don't have enough resources". We don't know who will be interested in fixing some sage package for some platform.

So what is the implication of your argument? Stop running CI (testing sage packages)?

Your argument against it, people could fix it doesn't because testing is not perfect instead

I didn't argue that "people could fix it doesn't, because testing is not perfect".

I said "build failures on user machines happen because testing (through CI) is not perfect".

We still support python 3.9 according to https://doc-release--sagemath.netlify.app/html/en/reference/spkg/python3

has been dropped since #39251 (unfortunately there isn't enough links between relevant parts in source code/documentation so people can forget to update one when another is updated)

OK. Then I will just leave the task of dropping other old platforms to the other PR.

@dimpase
Copy link
Member

dimpase commented Mar 6, 2025

We cannot say definitely "people will fix it", "people won't fix it", "we have enough resource", or "we don't have enough resources". We don't know who will be interested in fixing some sage package for some platform.

people should not be interested in fixing vendored packages which can be perfectly replaced by what's provided by systems/distros.

We have a lot of really messy old broken code in sagelib, big chunks should be redone, for a variety of reasons - this is where the time should go into.

Not elsewhere

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 6, 2025

people should not be interested in fixing vendored packages which can be perfectly replaced by what's provided by systems/distros.

off topic: then put them into _prereq. Please do that on your own PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disputed PR is waiting for community vote, see https://groups.google.com/g/sage-devel/c/IgBYUJl33SQ p: critical / 2 s: needs review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants