Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker-proxy cannot exit after send SIGINT #2421

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mingfukuang
Copy link

@mingfukuang mingfukuang commented Jul 19, 2019

Signed-off-by: mingfukuang kuang.mingfu@zte.com.cn

- What I did
when stop or delete one container with docker-proxy, stop container will call Unmap() to release port ,
func Unmap() need to acquire pm.lock.Lock() firstly , then stop docker-proxy , and then pm.lock.UnLock() . but some situation(cannot be reproduced), stop docker-proxy will hang on :

goroutine [78795220]
 0  0x000000000065e5c5 in syscall.Syscall6
    at /usr/local/go/src/syscall/asm_linux_amd64.s:45
 1  0x00000000004b914c in os.(*Process).blockUntilWaitable
    at /usr/local/go/src/os/wait_waitid.go:28
 2  0x00000000004b28eb in os.(*Process).wait
    at /usr/local/go/src/os/exec_unix.go:22
 3  0x00000000004b0e6b in os.(*Process).Wait
    at /usr/local/go/src/os/doc.go:49
 4  0x000000000094710d in os/exec.(*Cmd).Wait
    at /usr/local/go/src/os/exec/exec.go:434
 5  0x0000000000daadd6 in github.com/docker/libnetwork/portmapper.(*proxyCommand).Stop
    at /usr/src/debug/docker-engine/vendor/src/github.com/docker/libnetwork/portmapper/proxy.go:98
 6  0x0000000000da98a3 in github.com/docker/libnetwork/portmapper.(*PortMapper).Unmap
    at /usr/src/debug/docker-engine/vendor/src/github.com/docker/libnetwork/portmapper/mapper.go:185
 7  0x000000000093144f in github.com/docker/libnetwork/drivers/bridge.(*bridgeNetwork).releasePort
    at /usr/src/debug/docker-engine/vendor/src/github.com/docker/libnetwork/drivers/bridge/port_mapping.go:133
 8  0x0000000000930e91 in github.com/docker/libnetwork/drivers/bridge.(*bridgeNetwork).releasePortsInternal
    at /usr/src/debug/docker-engine/vendor/src/github.com/docker/libnetwork/drivers/bridge/port_mapping.go:113
 9  0x0000000000930d1e in github.com/docker/libnetwork/drivers/bridge.(*bridgeNetwork).releasePorts
    at /usr/src/debug/docker-engine/vendor/src/github.com/docker/libnetwork/drivers/bridge/port_mapping.go:105
10  0x0000000000926907 in github.com/docker/libnetwork/drivers/bridge.(*driver).RevokeExternalConnectivity
    at /usr/src/debug/docker-engine/vendor/src/github.com/docker/libnetwork/drivers/bridge/bridge.go:1288
11  0x0000000000805bbc in github.com/docker/libnetwork.(*endpoint).sbLeave
    at /usr/src/debug/docker-engine/vendor/src/github.com/docker/libnetwork/endpoint.go:688
12  0x00000000008049eb in github.com/docker/libnetwork.(*endpoint).Leave
    at /usr/src/debug/docker-engine/vendor/src/github.com/docker/libnetwork/endpoint.go:644
13  0x00000000008250c2 in github.com/docker/libnetwork.(*sandbox).delete
    at /usr/src/debug/docker-engine/vendor/src/github.com/docker/libnetwork/sandbox.go:227
14  0x0000000000824d20 in github.com/docker/libnetwork.(*sandbox).Delete
    at /usr/src/debug/docker-engine/vendor/src/github.com/docker/libnetwork/sandbox.go:188
15  0x000000000053d872 in github.com/docker/docker/daemon.(*Daemon).releaseNetwork
    at /usr/src/debug/docker-engine/.gopath/src/github.com/docker/docker/daemon/container_operations.go:808
16  0x000000000058cb6a in github.com/docker/docker/daemon.(*Daemon).Cleanup
    at /usr/src/debug/docker-engine/.gopath/src/github.com/docker/docker/daemon/start.go:197
17  0x000000000057b147 in github.com/docker/docker/daemon.(*Daemon).StateChanged
    at /usr/src/debug/docker-engine/.gopath/src/github.com/docker/docker/daemon/monitor.go:64
18  0x00000000005c5886 in github.com/docker/docker/libcontainerd.(*container).handleEvent.func1
    at /usr/src/debug/docker-engine/.gopath/src/github.com/docker/docker/libcontainerd/container_linux.go:224
19  0x00000000005c5da0 in github.com/docker/docker/libcontainerd.(*queue).append.func1
    at /usr/src/debug/docker-engine/.gopath/src/github.com/docker/docker/libcontainerd/queue_linux.go:26  

I have the whole stack file ,but it too big (89M), so I did not put here.

corresponding code as followed:

func (p *proxyCommand) Stop() error {
	if p.cmd.Process != nil {
		if err := p.cmd.Process.Signal(os.Interrupt); err != nil {
			return err
		}
		return p.cmd.Wait() //some situation, docker-proxy cannot quit, and hang on at here.
	}
	return nil
}

once one container cann't stop docker-proxy , above mentioned pm.lock cann't be released, so this container cann’t be stopped , Also all operation of this container could be hang on.
Furthermore , if other containers enter stop or delete process, those containers also need to acquire the global pm.lock, and get stuck . As result, other operations to those containers also hang on , such as docker inspect, docker exec ,etc.

  • How I did
    when above mentioned situation happen, adding protective measures to fix the problem of the global lock(pm.lock)cann't being released.

@GordonTheTurtle
Copy link

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "master" git@github.com:mingfukuang/libnetwork.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@mingfukuang mingfukuang force-pushed the master branch 5 times, most recently from 5aa615a to db56873 Compare July 20, 2019 02:10
@Marshalzxy
Copy link

Why libnetwork use os.Interrupt signal try to kill docker-proxy? Is there any special purpose?

@mingfukuang
Copy link
Author

ping @selansen @thaJeztah, pls help .

@thaJeztah
Copy link
Member

@mingfukuang looks like the commit message itself is missing a DCO sign-off; could you amend your commit to have a sign-off? (not just the PR description on GitHub)?

Also make sure that the sign-off uses your real name (not your GitHub username)

@thaJeztah
Copy link
Member

/cc @euanh @kolyshkin PTAL

@mingfukuang
Copy link
Author

mingfukuang commented Aug 5, 2019

@mingfukuang looks like the commit message itself is missing a DCO sign-off; could you amend your commit to have a sign-off? (not just the PR description on GitHub)?

I already signed just a moment ago , please help review again, appreciate your quick reply.

@mingfukuang mingfukuang closed this Aug 5, 2019
@mingfukuang mingfukuang reopened this Aug 5, 2019
Copy link
Collaborator

@selansen selansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member

ping @euanh ptal

select {
case result := <-waitChan:
return result
case <-time.After(15 * time.Second):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leads to a stray timer. It is better to create a timer and properly stop it.

t := time.NewTimer(15 * time.Second)
defer t.Stop()
select {
	case result := <-waitChan:
		return result
	case <- t.C:
		if err := p.cmd.Process.Signal(os.Kill); err != nil {
			return err
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it would be better to stop the timer by ourself rather than GC do, I will refresh code .

@@ -65,10 +66,23 @@ func (p *proxyCommand) Start() error {

func (p *proxyCommand) Stop() error {
if p.cmd.Process != nil {
if err := p.cmd.Process.Signal(os.Interrupt); err != nil {
if err := p.cmd.Process.Signal(syscall.SIGTERM); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to replace SIGINT with SIGTERM?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was wondering as well; should we handle both ? (for backward compat)?

Copy link
Author

@mingfukuang mingfukuang Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, i’m also not sure why send SIGINT other than SIGTERM. Proxy will deal with both of them . According appearance of problem, SIGINT looks can't got response. Besides, semantically speaking,SIGTERM could be suitable. so , i made this change.
Looking forward to your further suggestions :-)

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at my comments.

Also, if would be nice if commit message would contain some info about why this is done (I guess you can just reuse the PR description).

Copy link
Author

@mingfukuang mingfukuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy, thanks for your time , I will add commit message later.

select {
case result := <-waitChan:
return result
case <-time.After(15 * time.Second):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it would be better to stop the timer by ourself rather than GC do, I will refresh code .

@@ -65,10 +66,23 @@ func (p *proxyCommand) Start() error {

func (p *proxyCommand) Stop() error {
if p.cmd.Process != nil {
if err := p.cmd.Process.Signal(os.Interrupt); err != nil {
if err := p.cmd.Process.Signal(syscall.SIGTERM); err != nil {
Copy link
Author

@mingfukuang mingfukuang Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, i’m also not sure why send SIGINT other than SIGTERM. Proxy will deal with both of them . According appearance of problem, SIGINT looks can't got response. Besides, semantically speaking,SIGTERM could be suitable. so , i made this change.
Looking forward to your further suggestions :-)

Signed-off-by: mingfukuang <kuang.mingfu@zte.com.cn>

When stop or delete one container with docker-proxy, stop container will call Unmap() to release port ,
func Unmap() need to acquire pm.lock.Lock() firstly, then stop docker-proxy, and then pm.lock.UnLock().
but some situation(cannot be reproduced), stop docker-proxy will hang on.

Once one container cann't stop docker-proxy, above mentioned pm.lock cann't be released, so this container cann’t be stopped,
Also all operation of this container could be hang on.  Furthermore, if other containers enter stop or delete process,
those containers also need to acquire the global pm.lock, and get stuck.

As result, other operations to those containers also hang on, such as docker inspect, docker exec, etc.

To fix this, I add protective measures to avoid the situation of global lock(pm.lock)cann't being released.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants