-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible change to ci.jenkins.io's linux agents causes core test failure #3890
Comments
I have just increased the timeout from 300s to 500s (!!!) because build 5736 reports a duration of 350s it takes to complete the test on a linux agent with Java 21. |
Git plugin builds are failing due to 60 minute test timeout limit on the Windows agents. Inside the failed build logs is this message:
Immediately after that message is logged, there is a multiple minute delay in log entries When I replay the same job with the added Pipeline parameter |
Likely related: #3889 (comment) |
I've replayed a PR with the artifact opt out an the test failed with the specific reason still. |
I believe @lemeurherve comment was aimed at @MarkEWaite (e.g. the message About the problem mentioned in this issue on Jenkins Core (tests timeouts), there isn't anythin obvious in the image changelog: https://github.com/jenkins-infra/packer-images/releases/tag/1.45.0 . Might be related to any updated packages in the Ubuntu distribution for these images, or maybe something else. We are diagnosing, but the only thing sure is that it appeared during the 5th of January: we still don't have any other proof for now |
We've deployed a new image version (1.46.0) in jenkins-infra/jenkins-infra#3246 with a bunch of upgraded Ubuntu packages (including kernel). @NotMyFault Can you retry a build to see the behavior? (what changed since yesterday: ACP fully cleaned up + new template image). |
Additionally, could be linked with #3874 (but I see that JDK11 and JDK17 are also having timeouts). What is the test doing exactly? Is it requiring network operation "en-masse"? |
I'll give it a shot later today. But I can build it normally with the acp enabled? |
I'm not sure to fully understand your question, could you clarify? |
Removing the timeout increase results in the same outcome as before: https://ci.jenkins.io/job/Core/job/jenkins/job/PR-8842/1/ The test case takes 230s still. |
A first check of the resource usage centered on the JDK21 case:
vs.
|
=> not so much differences on the resource usages. it's really weird 🤔 |
@NotMyFault I'm going to rollback to 1.44.0 (the image used before the 5th of January) to see if we can return to previous behavior |
Launched https://ci.jenkins.io/job/Core/job/jenkins/job/PR-8842/2/ with the 1.44.0 version. Let's see |
https://ci.jenkins.io/job/Core/job/jenkins/job/PR-8842/2/testReport/ :( |
Ok so it is not the image. Gotta check other things then |
) Infrastructure issue that we don't want to allow to delay the release candidate of 2.426.3. jenkins-infra/helpdesk#3890 is the issue (cherry picked from commit 3dbbf26)
I backported the |
@dduportal Following up the chat in matrix, I've replayed the PR with the reverted image and the test is no longer failing on any JDK. |
@NotMyFault can you confirm in the next 48h it is still ok by closing the issue with a comment? |
Wouldn't bumping the template again in the future result in a failing test, as before? |
I don’t see the causality here. It was the same day (eg apparent correlation) but using 1.44.0 (previous) template also triggered the timeout issue which shows it is unrelated. Then switching back to 1.45.0 showed the issue fixed and 1.46 also. it means we don’t really know what caused but clearly not the template. however we can keep searching what changed. The network is a good clue as we changed a lot of things these past days. |
I guess we can close the issue for now. The timeout is set to 500s and thanks to @MarkEWaite this change is present on the current LTS line too, preventing future issues. |
There should be an issue tracking the removal of the workaround. |
I have filed JENKINS-72577 to track the removal of the workaround added in jenkinsci/jenkins#8840. That JENKINS-72577 in turn depends on the resolution of the operational issue that led to the workaround, tracked in this (now reopened) ticket. |
The issue is marked as related to ACP, but I still haven't found any clue that it is (or it is not). Did you see anything pointing in its direction? |
The very slow test runtime is only happening on CI infrastructure, as the test takes only 5 seconds for me locally, even when running in a clean Docker container without a warm Maven cache. Whether the operational issue is due to ACP or some other area of infrastructure I cannot be sure, but I suspect ACP because the test is invoking a second Maven process, and we have seen various ACP complications in the past running Maven within Maven such as jenkinsci/maven-hpi-plugin#541. I have updated the wording of JENKINS-72577 to clarify the above. |
No problem: I did not had enough time (neither the rest of the team) to check in details. Reopening the issue is the correct move (thanks for doing it!) as the problem is present on ci.jenkins.io infrastructure: we acknowledge it. My question was really in case you already saw things that we missed. Thanks for the explanation |
Opened jenkinsci/jenkins#9374 as the tests are now around ~5s on Linux with JDK21, following 4 months of infrastructure changes (see PR message). |
Closing the infra issue as we confirmed the problem is now gone. Besides, jenkinsci/jenkins#9374 has been approved and will merged, closing https://issues.jenkins.io/browse/JENKINS-72577 |
Service(s)
ci.jenkins.io
Summary
Hey,
It appears that a change to the linux agents with Java 17 and 21 caused a test failure in Jenkins core. Around January the 5fth,
MavenTest#sensitiveParameters()
started to take much longer than usual.Starting with 5730 the single test takes 229s (!!), exceeding our test timeout of 180s by far, causing builds to fail.
5728 notes ~39s for the entire
MavenTest
, build on January the 3rd.Locally, the entire
MavenTest
takes less than 30s to complete for me usingmvn clean verify -Dtest=hudson.tasks.MavenTest
.Interestingly, this affects linux agents with Java 17 and 21 only. The test passes in its usual time on windows agents with Java 17 and linux agents with Java 11: https://ci.jenkins.io/job/Core/job/jenkins/job/master/5730/pipeline-graph/
Reproduction steps
No response
The text was updated successfully, but these errors were encountered: