Skip to content

A couple of CI improvements #1865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed

A couple of CI improvements #1865

wants to merge 11 commits into from

Conversation

pks-t
Copy link
Contributor

@pks-t pks-t commented Jan 3, 2025

Hi,

this patch series addresses a couple of issues I've found while
investigating flaky CI jobs. Besides two more fixes for flaky jobs it
also removes some stale code and simplifies the setup on GitHub Actions
to always use containerized jobs on Linux.

Test runs can be found for GitLab 1 and GitHub 2.

Thanks!

Patrick

To: git@vger.kernel.org

--- b4-submit-tracking ---

This section is used internally by b4 prep for tracking purposes.

{
"series": {
"revision": 1,
"change-id": "20250103-b4-pks-ci-fixes-2d0a23fb5c78",
"prefixes": []
}
}

Hi,

this patch series addresses a couple of issues I've found while
investigating flaky CI jobs. Besides two more fixes for flaky jobs it
also removes some stale code and simplifies the setup on GitHub Actions
to always use containerized jobs on Linux.

Test runs can be found for GitLab [1] and GitHub [2].

Changes in v2:

  - Expand a bit on the reasoning behind the conversion to use
    containerized jobs.
  - Fix commit message typo.
  - Properly fix the race in t7422 via pipe stuffing, as proposed by
    Peff.
  - Link to v1: https://lore.kernel.org/r/20250103-b4-pks-ci-fixes-v1-0-a9bb95dff833@pks.im

Changes in v3:

  - Another iteration on the SIGPIPE test, which should now finally plug
    the race.
  - Link to v2: https://lore.kernel.org/r/20250106-b4-pks-ci-fixes-v2-0-06ae540771b7@pks.im

Thanks!

Patrick

[1]: https://gitlab.com/gitlab-org/git/-/merge_requests/277
[2]: git#1865

To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>
Cc: Phillip Wood <phillip.wood@dunelm.org.uk>

--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
  "series": {
    "revision": 3,
    "change-id": "20250103-b4-pks-ci-fixes-2d0a23fb5c78",
    "prefixes": [],
    "history": {
      "v1": [
        "20250103-b4-pks-ci-fixes-v1-0-a9bb95dff833@pks.im"
      ],
      "v2": [
        "20250106-b4-pks-ci-fixes-v2-0-06ae540771b7@pks.im"
      ]
    }
  }
}
Copy link

There are issues in commit 7e98d02:
A couple of CI improvements
Commit not signed off

pks-t added a commit to pks-t/git that referenced this pull request Jan 3, 2025
Hi,

this patch series addresses a couple of issues I've found while
investigating flaky CI jobs. Besides two more fixes for flaky jobs it
also removes some stale code and simplifies the setup on GitHub Actions
to always use containerized jobs on Linux.

Test runs can be found for GitLab [1] and GitHub [2].

Thanks!

Patrick

[1]: https://gitlab.com/gitlab-org/git/-/merge_requests/277
[2]: git#1865

To: git@vger.kernel.org

--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
  "series": {
    "revision": 1,
    "change-id": "20250103-b4-pks-ci-fixes-2d0a23fb5c78",
    "prefixes": []
  }
}
Copy link

There are issues in commit 8cf0ea0:
A couple of CI improvements
Commit not signed off

Copy link

There are issues in commit 8cf0ea0:
A couple of CI improvements
Commit not signed off

1 similar comment
Copy link

There are issues in commit 8cf0ea0:
A couple of CI improvements
Commit not signed off

Copy link

There are issues in commit ca0f7c9:
rootless
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 8cf0ea0:
A couple of CI improvements
Commit not signed off

Copy link

There are issues in commit a28df16:
rootless
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 8cf0ea0:
A couple of CI improvements
Commit not signed off

Copy link

There are issues in commit 3f9bc22:
rootless
Commit checks stopped - the message is too short
Commit not signed off

pks-t added a commit to pks-t/git that referenced this pull request Jan 6, 2025
Hi,

this patch series addresses a couple of issues I've found while
investigating flaky CI jobs. Besides two more fixes for flaky jobs it
also removes some stale code and simplifies the setup on GitHub Actions
to always use containerized jobs on Linux.

Test runs can be found for GitLab [1] and GitHub [2].

Changes in v2:
  - Expand a bit on the reasoning behind the conversion to use
    containerized jobs.
  - Fix commit message typo.
  - Properly fix the race in t7422 via pipe stuffing, as proposed by
    Peff.
  - Link to v1: https://lore.kernel.org/r/20250103-b4-pks-ci-fixes-v1-0-a9bb95dff833@pks.im

Thanks!

Patrick

[1]: https://gitlab.com/gitlab-org/git/-/merge_requests/277
[2]: git#1865

To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>

--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
  "series": {
    "revision": 2,
    "change-id": "20250103-b4-pks-ci-fixes-2d0a23fb5c78",
    "prefixes": [],
    "history": {
      "v1": [
        "20250103-b4-pks-ci-fixes-v1-0-a9bb95dff833@pks.im"
      ]
    }
  }
}
Copy link

There are issues in commit d4284f5:
A couple of CI improvements
Commit not signed off

pks-t added 10 commits January 7, 2025 13:28
Two of our tests in t0060 verify that the runtime prefix functionality
works as expected by creating a separate directory hierarchy, copying
the Git executable in there and then creating scripts relative to that
executable.

These tests fail quite regularly in GitLab CI with the following error:

    expecting success of 0060.218 '%(prefix)/ works':
            mkdir -p pretend/bin &&
            cp "$GIT_EXEC_PATH"/git$X pretend/bin/ &&
            git config yes.path "%(prefix)/yes" &&
            GIT_EXEC_PATH= ./pretend/bin/git config --path yes.path >actual &&
            echo "$(pwd)/pretend/yes" >expect &&
            test_cmp expect actual
    ++ mkdir -p pretend/bin
    ++ cp /c/GitLab-Runner/builds/gitlab-org/git/git.exe pretend/bin/
    cp: cannot create regular file 'pretend/bin/git.exe': Device or resource busy
    error: last command exited with $?=1
    not ok 218 - %(prefix)/ works

Seemingly, the "git.exe" binary we are trying to overwrite is still
being held open. It is somewhat puzzling why exactly that is: while the
preceding test _does_ write to and execute the same path, it should have
exited and shouldn't keep any backgrounded processes around. So it must
be held open by something else, either in MinGW or in Windows itself.

While the root cause is puzzling, the workaround is trivial enough:
instead of writing the file twice we simply pull the common setup into a
separate test case so that we won't observe EBUSY in the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
One test in t7422 asserts that `git submodule status --recursive`
properly handles SIGPIPE. This test is flaky though and may sometimes
not see a SIGPIPE at all:

    expecting success of 7422.18 'git submodule status --recursive propagates SIGPIPE':
            { git submodule status --recursive 2>err; echo $?>status; } |
                    grep -q X/S &&
            test_must_be_empty err &&
            test_match_signal 13 "$(cat status)"
    ++ git submodule status --recursive
    ++ grep -q X/S
    ++ echo 0
    ++ test_must_be_empty err
    ++ test 1 -ne 1
    ++ test_path_is_file err
    ++ test 1 -ne 1
    ++ test -f err
    ++ test -s err
    +++ cat status
    ++ test_match_signal 13 0
    ++ test 0 = 141
    ++ test 0 = 269
    ++ return 1
    error: last command exited with $?=1
    not ok 18 - git submodule status --recursive propagates SIGPIPE

The issue is caused by us using grep(1) to terminate the pipe on the
first matching line in the recursing git-submodule(1) process. Standard
streams are typically buffered though, so this condition is racy and may
cause us to terminate the pipe after git-submodule(1) has already
exited, and in that case we wouldn't see the expected signal.

Fix the issue by generating a couple thousand nested submodules and
matching on the first nested submodule. This ensures that the recursive
git-submodule(1) process completely fills its stdout buffer, which makes
subsequent writes block until the downstream consumer of the pipe either
fully drains it or closes it.

To verify that this works as expected one can apply the following patch
to the preimage of this commit, which used to reliably trigger the race:

    diff --git a/t/t7422-submodule-output.sh b/t/t7422-submodule-output.sh
    index 3c5177cc30..df6001f8a0 100755
    --- a/t/t7422-submodule-output.sh
    +++ b/t/t7422-submodule-output.sh
    @@ -202,7 +202,7 @@ test_expect_success !MINGW 'git submodule status --recursive propagates SIGPIPE'
     		cd repo &&
     		GIT_ALLOW_PROTOCOL=file git submodule add "$(pwd)"/../submodule &&
     		{ git submodule status --recursive 2>err; echo $?>status; } |
    -			grep -q recursive-submodule-path-1 &&
    +			{ sleep 1 && grep -q recursive-submodule-path-1 && sleep 1; } &&
     		test_must_be_empty err &&
     		test_match_signal 13 "$(cat status)"
     	)

With the pipe-stuffing workaround the test runs successfully.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
The containerized jobs in GitHub Actions run as root, giving them
special permissions to for example delete files even when the user
shouldn't be able to due to file permissions. This limitation keeps us
from using containerized jobs for most of our Ubuntu-based jobs as it
causes a number of tests to fail.

Adapt the jobs to create a separate user that executes the test suite.
This follows similar infrastructure that we already have in GitLab CI.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
We have split the CI jobs in GitHub Workflows into two categories:

  - Those running on a machine pool directly.

  - Those running in a container on the machine pool.

The latter is more flexible because it allows us to freely pick whatever
container image we want to use for a specific job, while the former only
allows us to pick from a handful of different distros. The containerized
jobs do not have any significant downsides to the best of my knowledge:

  - They aren't significantly slower to start up. A quick comparison by
    Peff shows that the difference is mostly lost in the noise:

            job             |  old | new
        --------------------|------|------
        linux-TEST-vars      11m30s 10m54s
        linux-asan-ubsan     30m26s 31m14s
        linux-gcc             9m47s 10m6s
        linux-gcc-default     9m47s  9m41s
        linux-leaks          25m50s 25m21s
        linux-meson          10m36s 10m41s
        linux-reftable       10m25s 10m23s
        linux-reftable-leaks 27m18s 27m28s
        linux-sha256          9m54s 10m31s

    Some jobs are a bit faster, some are a bit slower, but there does
    not seem to be any significant change.

  - Containerized jobs run as root, which keeps a couple of tests from
    running. This has been addressed in the preceding commit though,
    where we now use setpriv(1) to run tests as a separate user.

  - GitHub injects a Node binary into containerized jobs, which is
    dynamically linked. This has led to some issues in the past [1], but
    only for our 32 bit jobs. The issues have since been resolved.

Overall there seem to be no downsides, but the upside is that we have
more control over the exact image that these jobs use. Convert the Linux
jobs accordingly.

[1]: https://lore.kernel.org/git/20240912094841.GD589828@coredump.intra.peff.net/

Signed-off-by: Patrick Steinhardt <ps@pks.im>
We explicitly list the distro of Linux-based jobs, but it is equivalent
to the name of the image in almost all cases, except that colons are
replaced with dashes. Drop the redundant information and massage it in
our CI scripts, which is equivalent to how we do it in GitLab CI.

There are a couple of exceptions:

  - The "linux32" job, whose distro name is different than the image
    name. This is handled by adapting all sites to use the new name.

  - The "alpine" and "fedora" jobs, neither of which specify a tag for
    their image. This is handled by adding the "latest" tag.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
The "linux-old" job was historically testing against the oldest
supported LTS release of Ubuntu. But with c85bcb5 (gitlab-ci: switch
from Ubuntu 16.04 to 20.04, 2024-10-31) it has been converted to test
against Ubuntu 20.04, which already gets exercised in a couple of other
CI jobs. It's thus not adding any significant test coverage.

Drop the job.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Add another job to GitLab CI that tests against the i386 architecture.
This job is equivalent to the same job in GitHub Workflows.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
With c85bcb5 (gitlab-ci: switch from Ubuntu 16.04 to 20.04,
2024-10-31) we have adapted the last CI job to stop using Ubuntu 16.04
in favor of Ubuntu 20.04. Remove the special-casing we still have in our
CI scripts.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Both GitHub Actions and GitLab CI use the "ubuntu:latest" tag as the
default image for most jobs. This tag is somewhat misleading though, as
it does not refer to the latest release of Ubuntu, but to the latest LTS
release thereof. But as we already have a couple of jobs exercising the
oldest LTS release of Ubuntu that Git still supports, it would make more
sense to test the oldest and youngest versions of Ubuntu.

Adapt these jobs to instead use the "ubuntu:rolling" tag, which refers
to the actual latest release, which currently is Ubuntu 24.10.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Support for Azure Pipelines has been retired in 6081d38 (ci: retire
the Azure Pipelines definition, 2020-04-11) in favor of GitHub Actions.
Our CI library still has some infrastructure left for Azure though that
is now unused. Remove it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Copy link

There are issues in commit 24e0510:
A couple of CI improvements
Commit not signed off

@pks-t pks-t closed this May 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant