Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcontainer: read init-p: connection reset by peer #1547

Closed
sboeuf opened this issue Aug 4, 2017 · 9 comments
Closed

libcontainer: read init-p: connection reset by peer #1547

sboeuf opened this issue Aug 4, 2017 · 9 comments

Comments

@sboeuf
Copy link
Contributor

sboeuf commented Aug 4, 2017

I have here https://github.com/clearcontainers/agent a piece of code relying heavily on libcontainer to spawn new containers/processes inside a VM.
I started some long running tests to validate the functioning but I run into some issues...
Here is what I do

docker run -td --name test ubuntu sh
i=0; while :; do ((i++)); echo -e "loop $i"; docker exec test bash -c "echo hello"; done

And after 15930 iterations, I get always the same error:

WARN: os: process already finished
ERROR: Could not run process 31865: container_linux.go:265: starting container process caused "read init-p: connection reset by peer"

After that, I cannot run any more exec command.

Any idea ?

@sboeuf
Copy link
Contributor Author

sboeuf commented Aug 4, 2017

@mrunalp

@crosbymichael
Copy link
Member

You need to import _ "github.com/opencontainers/runc/libcontainer/nsenter" if you are using libcontainer directly

@sboeuf
Copy link
Contributor Author

sboeuf commented Aug 4, 2017

@sboeuf
Copy link
Contributor Author

sboeuf commented Aug 9, 2017

After some investigations, I found and fixed my issue. Every time that we call into container.Run() or container.Start(), the calling process has to wait for one child to be reaped. This should be clearly specified in the libcontainer code comments. It would help to understand that if we don't do that, we leave one zombie every time a new process is started.

@sboeuf sboeuf closed this as completed Aug 9, 2017
@cyphar
Copy link
Member

cyphar commented Aug 16, 2017

@sboeuf The zombie generation has been fixed in #1506, it was not intentional.

@sboeuf
Copy link
Contributor Author

sboeuf commented Aug 17, 2017

@cyphar Cool, so you think I can safely remove the reaper from my code ? Here: clearcontainers/agent@56aca91

Also, what do you mean by "it was not intentional" ? Not intentional to fix this issue with the PR #1506 or you mean that the issue was not intentional and that libcontainer should have reaped this zombie child from the beginning but that you never realized ?

@cyphar
Copy link
Member

cyphar commented Aug 17, 2017

runc should've been reaping the zombie created during the runc create phase, and not doing that was not intentional (and #1506 fixed that oversight). It is still likely that you need to have a reaper for other reasons (collecting exit codes and a few other cases), but YMMV.

@sboeuf
Copy link
Contributor Author

sboeuf commented Aug 17, 2017

@cyphar well of course I still have to wait for the exit code, but if I understand #1506 correctly, there is no runc:[1:CHILD] zombie process left behind after we do a container.Start(process) or container.Run(process), right ?
You're talking about runc, but really what this PR does is reaping the process from libcontainer, so that it is transparent for any consumer of libcontainer, right ?

@cyphar
Copy link
Member

cyphar commented Aug 17, 2017

Yeah, it fixes the issue for libcontainer users (runc already handled zombies), and it now reaps the runc:[1:CHILD] zombies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants