-
Notifications
You must be signed in to change notification settings - Fork 66
Seeing persistent/repeated 'persistentvolumeclaim "jenkins-home" not found' errors after OSIO env reset #4121
Comments
@ldimaggi we are trying to understand the root cause of this issue. @aslakknutsen would you like shed some lights here? |
@ldimaggi Just to correct you, build team is facing this issue from past 5-6 days and Service Delivery was also informed about that. I discussed this issue with @mmclanerh This issue is not happening after or because of #3934 This is a different issue happening before that fix also. The fix we just provided in Jenkins version 4.0.97 is related to #3956 and is affecting the #3934 as Jenkins will come up fast now and 503 issue will not block the user as you said build is taking 15-20 minutes in #3934 (comment). We are still working on #3934 |
i would suggest looking at the pvc and clearing off any cruft / old files that arent relevant. i wonder if the amount of files present have ballooned due to stale/failed workspace clean ups? |
Not investigated this specifically, but it looks very similar to another issue from Friday. Essentially this can happen due to a Reset; Clean & Apply. Clean return 'deleted ok', but when Apply happens multiple seconds later, the PVC is still not deleted from Openshift, so Apply end up Updating the PVC instead of recreating it. Openshift then later comes around and Deletes the PVC. A second Reset fixes it after the PVC has been deleted. |
Still seeing this: It looks like Aslak is correct - a 2nd reset is needed. |
@ldimaggi is this still happening? |
I think we can close this one - it's been almost 2 months and I do not think the issue is happening now. |
Happening right now with account |
|
Seeing this problem again on October 17 - this is causing automated tests running on the starter-us-east-2 cluster to fail as the Jenkins pod fails to start after a reset. |
@aslakknutsen regarding your comment from Aug 6, did we try to fix it? |
This seems like an reset issue, assigning to platform team. |
Still facing this issue on us-east-2 cluster. @aslakknutsen @jmelis @mmclanerh do we any update on this? |
@chmouel thinks that this is the reason why pipelines fail, can we prioritize this? It causes e2e tests fail in prod-preview which in turn blocks deployments to production. |
related investigation #4598 (comment) |
@MatousJobanek please take a look. |
I tried to reconstruct what happened, it was useful when we were debugging different Jenkins issues in the past. All times are UTC
|
We have two jobs that failed with similar error but contain a different message in OpenShift events:
|
@ppitonak any chances you can do a |
@chmouel done, will be available in following test runs |
Just giving a link to my proposed solution in another issue: #4598 (comment) |
Our accounts are set to either |
Raising severity to level "2" based on investigation that this issue is the root cause of issue # #4598 - resulting after a user resets his user environment. |
Just a small update - the fix is done fabric8-services/fabric8-tenant#714 - now I'm just waiting till the quay database is fixed, so I can merge it and deploy it to prod-preview. |
I's in prod-preview now. #4598 (comment) |
I haven't seen this issue for long time. Closing |
Unfortunately I found it in our 5 hours old logs :( |
This failure seem to be caused by something else. See #4598 (comment) I'm assigning it to the build team to investigate the new failures. |
Those are indeed other issues but those are issues with the openshift platform, there is nothing we can 'fix' in there we just need to accept that is unstable, we perhaps can try to retry all the time but this is just going to amplify the issue, |
For a user (my user account) provisioned on starter-us-east-2 - after an OSIO env reset, jenkins pod fails to start. Event log includes:
This issue appeared today after the resolution of: #3934
The text was updated successfully, but these errors were encountered: