-
Notifications
You must be signed in to change notification settings - Fork 26.4k
A couple of CI improvements #1865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi, this patch series addresses a couple of issues I've found while investigating flaky CI jobs. Besides two more fixes for flaky jobs it also removes some stale code and simplifies the setup on GitHub Actions to always use containerized jobs on Linux. Test runs can be found for GitLab [1] and GitHub [2]. Changes in v2: - Expand a bit on the reasoning behind the conversion to use containerized jobs. - Fix commit message typo. - Properly fix the race in t7422 via pipe stuffing, as proposed by Peff. - Link to v1: https://lore.kernel.org/r/20250103-b4-pks-ci-fixes-v1-0-a9bb95dff833@pks.im Changes in v3: - Another iteration on the SIGPIPE test, which should now finally plug the race. - Link to v2: https://lore.kernel.org/r/20250106-b4-pks-ci-fixes-v2-0-06ae540771b7@pks.im Thanks! Patrick [1]: https://gitlab.com/gitlab-org/git/-/merge_requests/277 [2]: git#1865 To: git@vger.kernel.org Cc: Jeff King <peff@peff.net> Cc: Junio C Hamano <gitster@pobox.com> Cc: Phillip Wood <phillip.wood@dunelm.org.uk> --- b4-submit-tracking --- # This section is used internally by b4 prep for tracking purposes. { "series": { "revision": 3, "change-id": "20250103-b4-pks-ci-fixes-2d0a23fb5c78", "prefixes": [], "history": { "v1": [ "20250103-b4-pks-ci-fixes-v1-0-a9bb95dff833@pks.im" ], "v2": [ "20250106-b4-pks-ci-fixes-v2-0-06ae540771b7@pks.im" ] } } }
There are issues in commit 7e98d02: |
Hi, this patch series addresses a couple of issues I've found while investigating flaky CI jobs. Besides two more fixes for flaky jobs it also removes some stale code and simplifies the setup on GitHub Actions to always use containerized jobs on Linux. Test runs can be found for GitLab [1] and GitHub [2]. Thanks! Patrick [1]: https://gitlab.com/gitlab-org/git/-/merge_requests/277 [2]: git#1865 To: git@vger.kernel.org --- b4-submit-tracking --- # This section is used internally by b4 prep for tracking purposes. { "series": { "revision": 1, "change-id": "20250103-b4-pks-ci-fixes-2d0a23fb5c78", "prefixes": [] } }
There are issues in commit 8cf0ea0: |
There are issues in commit 8cf0ea0: |
1 similar comment
There are issues in commit 8cf0ea0: |
There are issues in commit ca0f7c9: |
There are issues in commit 8cf0ea0: |
There are issues in commit a28df16: |
There are issues in commit 8cf0ea0: |
There are issues in commit 3f9bc22: |
Hi, this patch series addresses a couple of issues I've found while investigating flaky CI jobs. Besides two more fixes for flaky jobs it also removes some stale code and simplifies the setup on GitHub Actions to always use containerized jobs on Linux. Test runs can be found for GitLab [1] and GitHub [2]. Changes in v2: - Expand a bit on the reasoning behind the conversion to use containerized jobs. - Fix commit message typo. - Properly fix the race in t7422 via pipe stuffing, as proposed by Peff. - Link to v1: https://lore.kernel.org/r/20250103-b4-pks-ci-fixes-v1-0-a9bb95dff833@pks.im Thanks! Patrick [1]: https://gitlab.com/gitlab-org/git/-/merge_requests/277 [2]: git#1865 To: git@vger.kernel.org Cc: Jeff King <peff@peff.net> Cc: Junio C Hamano <gitster@pobox.com> --- b4-submit-tracking --- # This section is used internally by b4 prep for tracking purposes. { "series": { "revision": 2, "change-id": "20250103-b4-pks-ci-fixes-2d0a23fb5c78", "prefixes": [], "history": { "v1": [ "20250103-b4-pks-ci-fixes-v1-0-a9bb95dff833@pks.im" ] } } }
There are issues in commit d4284f5: |
Two of our tests in t0060 verify that the runtime prefix functionality works as expected by creating a separate directory hierarchy, copying the Git executable in there and then creating scripts relative to that executable. These tests fail quite regularly in GitLab CI with the following error: expecting success of 0060.218 '%(prefix)/ works': mkdir -p pretend/bin && cp "$GIT_EXEC_PATH"/git$X pretend/bin/ && git config yes.path "%(prefix)/yes" && GIT_EXEC_PATH= ./pretend/bin/git config --path yes.path >actual && echo "$(pwd)/pretend/yes" >expect && test_cmp expect actual ++ mkdir -p pretend/bin ++ cp /c/GitLab-Runner/builds/gitlab-org/git/git.exe pretend/bin/ cp: cannot create regular file 'pretend/bin/git.exe': Device or resource busy error: last command exited with $?=1 not ok 218 - %(prefix)/ works Seemingly, the "git.exe" binary we are trying to overwrite is still being held open. It is somewhat puzzling why exactly that is: while the preceding test _does_ write to and execute the same path, it should have exited and shouldn't keep any backgrounded processes around. So it must be held open by something else, either in MinGW or in Windows itself. While the root cause is puzzling, the workaround is trivial enough: instead of writing the file twice we simply pull the common setup into a separate test case so that we won't observe EBUSY in the first place. Signed-off-by: Patrick Steinhardt <ps@pks.im>
One test in t7422 asserts that `git submodule status --recursive` properly handles SIGPIPE. This test is flaky though and may sometimes not see a SIGPIPE at all: expecting success of 7422.18 'git submodule status --recursive propagates SIGPIPE': { git submodule status --recursive 2>err; echo $?>status; } | grep -q X/S && test_must_be_empty err && test_match_signal 13 "$(cat status)" ++ git submodule status --recursive ++ grep -q X/S ++ echo 0 ++ test_must_be_empty err ++ test 1 -ne 1 ++ test_path_is_file err ++ test 1 -ne 1 ++ test -f err ++ test -s err +++ cat status ++ test_match_signal 13 0 ++ test 0 = 141 ++ test 0 = 269 ++ return 1 error: last command exited with $?=1 not ok 18 - git submodule status --recursive propagates SIGPIPE The issue is caused by us using grep(1) to terminate the pipe on the first matching line in the recursing git-submodule(1) process. Standard streams are typically buffered though, so this condition is racy and may cause us to terminate the pipe after git-submodule(1) has already exited, and in that case we wouldn't see the expected signal. Fix the issue by generating a couple thousand nested submodules and matching on the first nested submodule. This ensures that the recursive git-submodule(1) process completely fills its stdout buffer, which makes subsequent writes block until the downstream consumer of the pipe either fully drains it or closes it. To verify that this works as expected one can apply the following patch to the preimage of this commit, which used to reliably trigger the race: diff --git a/t/t7422-submodule-output.sh b/t/t7422-submodule-output.sh index 3c5177cc30..df6001f8a0 100755 --- a/t/t7422-submodule-output.sh +++ b/t/t7422-submodule-output.sh @@ -202,7 +202,7 @@ test_expect_success !MINGW 'git submodule status --recursive propagates SIGPIPE' cd repo && GIT_ALLOW_PROTOCOL=file git submodule add "$(pwd)"/../submodule && { git submodule status --recursive 2>err; echo $?>status; } | - grep -q recursive-submodule-path-1 && + { sleep 1 && grep -q recursive-submodule-path-1 && sleep 1; } && test_must_be_empty err && test_match_signal 13 "$(cat status)" ) With the pipe-stuffing workaround the test runs successfully. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im>
The containerized jobs in GitHub Actions run as root, giving them special permissions to for example delete files even when the user shouldn't be able to due to file permissions. This limitation keeps us from using containerized jobs for most of our Ubuntu-based jobs as it causes a number of tests to fail. Adapt the jobs to create a separate user that executes the test suite. This follows similar infrastructure that we already have in GitLab CI. Signed-off-by: Patrick Steinhardt <ps@pks.im>
We have split the CI jobs in GitHub Workflows into two categories: - Those running on a machine pool directly. - Those running in a container on the machine pool. The latter is more flexible because it allows us to freely pick whatever container image we want to use for a specific job, while the former only allows us to pick from a handful of different distros. The containerized jobs do not have any significant downsides to the best of my knowledge: - They aren't significantly slower to start up. A quick comparison by Peff shows that the difference is mostly lost in the noise: job | old | new --------------------|------|------ linux-TEST-vars 11m30s 10m54s linux-asan-ubsan 30m26s 31m14s linux-gcc 9m47s 10m6s linux-gcc-default 9m47s 9m41s linux-leaks 25m50s 25m21s linux-meson 10m36s 10m41s linux-reftable 10m25s 10m23s linux-reftable-leaks 27m18s 27m28s linux-sha256 9m54s 10m31s Some jobs are a bit faster, some are a bit slower, but there does not seem to be any significant change. - Containerized jobs run as root, which keeps a couple of tests from running. This has been addressed in the preceding commit though, where we now use setpriv(1) to run tests as a separate user. - GitHub injects a Node binary into containerized jobs, which is dynamically linked. This has led to some issues in the past [1], but only for our 32 bit jobs. The issues have since been resolved. Overall there seem to be no downsides, but the upside is that we have more control over the exact image that these jobs use. Convert the Linux jobs accordingly. [1]: https://lore.kernel.org/git/20240912094841.GD589828@coredump.intra.peff.net/ Signed-off-by: Patrick Steinhardt <ps@pks.im>
We explicitly list the distro of Linux-based jobs, but it is equivalent to the name of the image in almost all cases, except that colons are replaced with dashes. Drop the redundant information and massage it in our CI scripts, which is equivalent to how we do it in GitLab CI. There are a couple of exceptions: - The "linux32" job, whose distro name is different than the image name. This is handled by adapting all sites to use the new name. - The "alpine" and "fedora" jobs, neither of which specify a tag for their image. This is handled by adding the "latest" tag. Signed-off-by: Patrick Steinhardt <ps@pks.im>
The "linux-old" job was historically testing against the oldest supported LTS release of Ubuntu. But with c85bcb5 (gitlab-ci: switch from Ubuntu 16.04 to 20.04, 2024-10-31) it has been converted to test against Ubuntu 20.04, which already gets exercised in a couple of other CI jobs. It's thus not adding any significant test coverage. Drop the job. Signed-off-by: Patrick Steinhardt <ps@pks.im>
Add another job to GitLab CI that tests against the i386 architecture. This job is equivalent to the same job in GitHub Workflows. Signed-off-by: Patrick Steinhardt <ps@pks.im>
With c85bcb5 (gitlab-ci: switch from Ubuntu 16.04 to 20.04, 2024-10-31) we have adapted the last CI job to stop using Ubuntu 16.04 in favor of Ubuntu 20.04. Remove the special-casing we still have in our CI scripts. Signed-off-by: Patrick Steinhardt <ps@pks.im>
Both GitHub Actions and GitLab CI use the "ubuntu:latest" tag as the default image for most jobs. This tag is somewhat misleading though, as it does not refer to the latest release of Ubuntu, but to the latest LTS release thereof. But as we already have a couple of jobs exercising the oldest LTS release of Ubuntu that Git still supports, it would make more sense to test the oldest and youngest versions of Ubuntu. Adapt these jobs to instead use the "ubuntu:rolling" tag, which refers to the actual latest release, which currently is Ubuntu 24.10. Signed-off-by: Patrick Steinhardt <ps@pks.im>
Support for Azure Pipelines has been retired in 6081d38 (ci: retire the Azure Pipelines definition, 2020-04-11) in favor of GitHub Actions. Our CI library still has some infrastructure left for Azure though that is now unused. Remove it. Signed-off-by: Patrick Steinhardt <ps@pks.im>
There are issues in commit 24e0510: |
Hi,
this patch series addresses a couple of issues I've found while
investigating flaky CI jobs. Besides two more fixes for flaky jobs it
also removes some stale code and simplifies the setup on GitHub Actions
to always use containerized jobs on Linux.
Test runs can be found for GitLab 1 and GitHub 2.
Thanks!
Patrick
To: git@vger.kernel.org
--- b4-submit-tracking ---
This section is used internally by b4 prep for tracking purposes.
{
"series": {
"revision": 1,
"change-id": "20250103-b4-pks-ci-fixes-2d0a23fb5c78",
"prefixes": []
}
}