Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle IOException in onOnline when failing to delete workspace, to prevent a node disconnection #142

Conversation

giacomolozito
Copy link
Contributor

Hello,
currently a failed workspace deletion attempt within onOnline will result in node disconnection, because the exception is not handled.

Unfortunately workspace deletion issues are getting more common as users run jobs inside docker containers that might run with a different user than jenkins, thus resulting in files left around in the workspace that jenkins can't delete. While this is a user problem and should be dealt with by ensuring jobs clean up their own files, what would normally be just an annoyance (the workspace files left around) can turn into an outage as the node fails to re-connect if it can't delete a workspace.

So this patch mitigates the issue by preventing onOnline from failing and by warning the operator that there are workspaces requiring manual cleanup (or jobs requiring a review).

Please note that https://issues.jenkins-ci.org/browse/JENKINS-55597 might be related to this or it might be another plugin - in my case it was branch-api not handling the exception so I patched it.

Kind regards

Copy link
Contributor

@bitwiseman bitwiseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giacomolozito
Throwing an exception out of this method should not take the node offline:

https://github.com/jenkinsci/jenkins/blob/2767b00146ce2ff2738b7fd7c6db95a26b8f9f39/core/src/main/java/hudson/slaves/ComputerListener.java#L135-L164

If an exception here does take a node offline, we can patch that here but the that is also a bug in Jenkins core.

Have you tested this locally?
What version of Jenkins are you using?

@bitwiseman
Copy link
Contributor

bitwiseman commented Apr 18, 2019

@giacomolozito
Verified - this is a broader issue. Opened https://issues.jenkins-ci.org/browse/JENKINS-57111.

But this looks like a reasonable way to address the problem locally - giving meaningful/helpful output when this happens.

Could you add a test for this to WorkspaceLocatorImplTest.java?

Then we can merge.
Thanks!

@bitwiseman
Copy link
Contributor

I'm going to go ahead and merge this for inclusion in the next release. It is a safe change.

@bitwiseman bitwiseman merged commit dc69676 into jenkinsci:master May 8, 2019
@bitwiseman
Copy link
Contributor

@giacomolozito
FYI, the underlying fix for this was released in v2.177.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants