Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-40825] move transients into DecoratedLauncher #180

Conversation

marvinthepa
Copy link
Contributor

as decorate will be called multiple times, and
we don't want to share these transients among calls -
especially as they will not be properly cleaned up.

Makes JENKINS-40825 much less severe, as
"Pipe not connected" now most of the time seems to
come from DurableTaskStep, which will just retry
instead of failing the job.

More importantly, it fixes the resource leak mentioned in
this comment and the following comments.

Note: this is just a first stab and probably needs polishing, especially
the new ContainerExecWatcher.close implementation, which
might not even be necessary.

as `decorate` will be called multiple times, and
we don't want to share those among calls.

Makes JENKINS-40825 much less severe, as
"Pipe not connected" now most of the time seems to
come from [DurableTaskStep][1], which will just retry
instead of failing the job.

Note: this is just a first stab and needs polishing

[1]: https://github.com/jenkinsci/workflow-durable-task-step-plugin/blob/ae18393/src/main/java/org/jenkinsci/plugins/workflow/steps/durable_task/DurableTaskStep.java#L332
@marvinthepa marvinthepa changed the title move transients into DecoratedLauncher [JENKINS-40825] move transients into DecoratedLauncher Aug 2, 2017
@marvinthepa
Copy link
Contributor Author

Whoops, forgot the tests. Will fix them right now.

@marvinthepa
Copy link
Contributor Author

marvinthepa commented Aug 2, 2017

Tests pass on my system, but I needed a small "hack" to make the slaves running (on virtualbox) be able to connect to my Jenkins, see #181.

@marvinthepa
Copy link
Contributor Author

Looks like this is not enough. Please don't merge yet.

Copy link
Contributor

@iocanel iocanel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For starters, kudos for this. I think that it will definitely help.

Having code and formatting changes in the same commit makes it a bit harder to review.

I don't have a change to request, but I'd welcome limiting the scope of changes (e.g. formatting and maybe removing the extra class CloasbleLauncher as I think that we could have the same functionality without it).

"[" + containerName + "] of pod [" + podName + "]." +
" Timed out waiting for container to become ready!");
}
private class ClosableLauncher extends Launcher.DecoratedLauncher implements Closeable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to create a new class? Can't we just have the exiting class add things to a list of closeables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, see #180 (comment)

@marvinthepa
Copy link
Contributor Author

marvinthepa commented Aug 2, 2017

@iocanel:

First of all, thanks for the review.

The new class was for the implementation of the close method, but I now realize that it could have been a List<Closeable> instead, so I could have done without it, right.

Anyway, I realized that this approach is not enough anyway - Even with this pull request, we are overwriting started, finished, and alive from different threads, so I am currently trying to move them into local variables instead of members.

Closing this pull request now, expect a new one tomorrow.

The formatting was just different because I extracted the class, but you are right about not putting formatting changes into the same commit as functionality changes, I will take care not to do that.

@marvinthepa marvinthepa closed this Aug 2, 2017
@iocanel
Copy link
Contributor

iocanel commented Aug 2, 2017

@marvinthepa: If by "this is not enough" you mean that there are still stuck threads, I noticed in the client that the InputStreamPumper.close() method didn't properly broke the pumping loop (in some cases), causing the thread to remain stuck. It seems to work somehow nicer, with this: fabric8io/kubernetes-client@61ae6ed which is now available in: https://github.com/fabric8io/kubernetes-client/releases/tag/v2.5.8 (which is in central). So you might want to try this version out.

@marvinthepa
Copy link
Contributor Author

marvinthepa commented Aug 3, 2017

I actually meant that we might still end up not even calling the close method of some of the resources that we use - so even if the close method works correctly, we might leave resources around.

See this branch for what I think works better.

I will try to verify a few of my assumptions (especially about proc being killed) and might polish it a bit more, and then create a new pull request.

@iocanel
Copy link
Contributor

iocanel commented Aug 3, 2017

@marvinthepa: If we know for sure that proc is being killed by the shell step, then things get simplified a lot, as it frees us completely from having to babysit and hold refs of the watch and proc.

@marvinthepa
Copy link
Contributor Author

Proc is not what is being killed by the shell step, but actually ContainerExecDecorator. So we need to actually keep a reference around for every proc that we started. I don't really like this, but I don't see an alternative.

@michaelajr
Copy link

michaelajr commented Aug 3, 2017

Am I understanding this correctly - that with this pull request, the "Pipe not connected" issue will be seen (logged) but will no longer "freeze" the build for 5 mins and then fail it?

@marvinthepa
Copy link
Contributor Author

marvinthepa commented Aug 3, 2017

@michaelajr:
Not exactly, but I am working on it. See the new branch for progress, as the approach in this branch was flawed.

If you have a test system, feel free to build from that branch and give it a spin, I would be interested to get feedback if it works for you.

You still might have issues when it takes you more than 10 seconds to start processes in kubernetes (i.e. likely when your kubernetes is overwhelmed), but the general situation should improve.

I managed to get rid of the pipe not connected errors, but I still need to get the proc cleanup right, that is why I haven't created a pull request yet.

@michaelajr
Copy link

@marvinthepa OK. We'll take a look. Thanks!

@marvinthepa
Copy link
Contributor Author

Please see #182 instead.

@marvinthepa marvinthepa deleted the bugfix/JENKINS-40825-broken-pipe branch August 21, 2017 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants