Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix signal terminated processes exit code #60

Merged

Conversation

chriskuehl
Copy link
Contributor

Previously, dumb-init handled exit codes wrong when the child was terminated via signal (rather than exiting normally), as reported by @msavage20 in #59.

Broken behavior from before:

ckuehl@neon:~$ sh -c 'kill -TERM $$'
ckuehl@neon:~$ echo $?
143
ckuehl@neon:~$ dumb-init sh -c 'kill -TERM $$'
ckuehl@neon:~$ echo $?
0

@chriskuehl chriskuehl added the bug label Mar 11, 2016
@asottile
Copy link
Contributor

:shipit:

chriskuehl added a commit that referenced this pull request Mar 11, 2016
…s-exit-code

Fix signal terminated processes exit code
@chriskuehl chriskuehl merged commit b0ee633 into Yelp:master Mar 11, 2016
@ailispaw
Copy link

This change makes exit status of containers 143 in most of cases with docker ps -a.
Is there any way to get the actual exit status of containers (child process)?
Sometimes I had used docker ps -aq -f exited=0 to get normal stopped containers.

@chriskuehl
Copy link
Contributor Author

@ailispaw it seems to be working as-expected for me:

ckuehl@raptors:~/dumb-init$ docker run dumb-init-test sh -c 'exit 0'
ckuehl@raptors:~/dumb-init$ docker ps -a | sed -n 2p | awk '{print $1}' | xargs docker inspect | grep ExitCode
        "ExitCode": 0,
ckuehl@raptors:~/dumb-init$ docker run dumb-init-test sh -c 'exit 1'
ckuehl@raptors:~/dumb-init$ docker ps -a | sed -n 2p | awk '{print $1}' | xargs docker inspect | grep ExitCode
        "ExitCode": 1,
ckuehl@raptors:~/dumb-init$ docker run dumb-init-test sh -c 'exit 55'
ckuehl@raptors:~/dumb-init$ docker ps -a | sed -n 2p | awk '{print $1}' | xargs docker inspect | grep ExitCode
        "ExitCode": 55,

Can you provide a bit more information or (ideally) a way to reproduce the issue?

@ailispaw
Copy link

@chriskuehl Thanks for the replay so promptly.

How about docker run dumb-init-test sh -c 'kill -TERM $$'?
docker stop kills containers with SIGTERM. So your examples are not in this case.

kill -TERM $$ itself should return 0 exit status, shouldn't it?

@ailispaw
Copy link

Oh, sh -c 'kill -TERM $$' returns exit status 143, but I would like to know the actual exit status instead of the number of the signal.

@chriskuehl
Copy link
Contributor Author

I think the expected behavior is that if you kill your program with SIGTERM, you get an exit code of 128+15, since this is consistent with how the process would behave outside of a Docker container.

In this case it probably indicates that your program is being killed by the signal directly (as opposed to the program handling the signal and exiting normally), or that it chooses to exit with 128+15 as the exit code after handling the signal. Probably the former in this case if it wasn't happening with dumb-init prior to 1.0.1.

For example, if you add a signal handler for TERM that exits with zero, you can avoid this while still handling TERM.

ckuehl@raptors:~/dumb-init$ docker run dumb-init-test dumb-init sh -c 'kill -TERM $$'                         
ckuehl@raptors:~/dumb-init$ docker ps -a | sed -n 2p | awk '{print $1}' | xargs docker inspect | grep ExitCode
        "ExitCode": 143,                                                                                        
ckuehl@raptors:~/dumb-init$ docker run dumb-init-test dumb-init sh -c 'trap "exit 0" TERM; kill -TERM $$'     
ckuehl@raptors:~/dumb-init$ docker ps -a | sed -n 2p | awk '{print $1}' | xargs docker inspect | grep ExitCode
        "ExitCode": 0,                                                                                          

I think we want to keep this behavior, since otherwise there's no way to distinguish between a process being killed by a signal, and a process that exits normally. If you want the process to exit zero when it receives TERM, that change should probably happen in the process.

In general our policy has been to make dumb-init as transparent as possible, so that it's like you're not using it at all (except for the desired behavior of proper signal forwarding). I think this behavior is consistent with that.

@chriskuehl
Copy link
Contributor Author

@ailispaw I don't think your process has a regular exit code, since it was killed by the kernel without handling the signal. Since WIFEXITED is false, we can't actually check the status code.

From man 2 waitpid:

WEXITSTATUS(status)
       returns  the  exit  status  of  the child.  This consists of the least significant 8 bits of the status
       argument that the child specified in a call to exit(3) or _exit(2) or as  the  argument  for  a  return
       statement in main().  This macro should be employed only if WIFEXITED returned true.

Since the behavior only changed when WEXITSTATUS is false, I don't think your process has a regular exit status that we could check. The old behavior of just returning zero was wrong and wasn't actually returning the exit status of the program.

@ailispaw
Copy link

Will trap "exit 0" TERM pass the signal to the child processes?
I have to modify each containers with trap even if the containers(images) are not mine?

I understand your point, but Docker has the only way to stop containers by signal.
I wish I could know if the container exited normally or not.

@ailispaw
Copy link

Oh!!
To achieve this, the containers(images) should handle SIGTERM to exit the process, shouldn't it?

@chriskuehl
Copy link
Contributor Author

The problem is that your processes don't really have an exit status since they're never exiting normally. Previously dumb-init was basically lying by returning the bottom 8 bits which were coincidentally always set to zero, but that's not because your process exited with code zero.

For example, here's a (slightly modified for readability) snippet from dash (sh) which does almost the same thing as dumb-init in this case:

STATIC int
getstatus(struct job *job) {
    int status;
    int retval;

    status = job->ps[job->nprocs - 1].status;
    retval = WEXITSTATUS(status);
    if (!WIFEXITED(status)) {
            /* XXX: limits number of signals */
            retval = WTERMSIG(status);
        retval += 128;
    }
    return retval;
}

If we revert to the old behavior, then all we're doing is lying about the process's exit code and making it seem like things succeed when they might have received a fatal signal instead.

@ailispaw
Copy link

I'm sorry I mistook it was caused by this change, but it's sort of limitation.
I'm not saying it should revert this PR, just wishing and would like to know a way to get that.
Thank you so much.

@ailispaw
Copy link

Just in case, FYI: I use the image which has supervisord as CMD, and try to run with dumb-init as ENTRYPOINT.

@ailispaw
Copy link

I learned about SIGTERM. I'm sorry for my stupid question. I had totally misunderstood.
As your explanation, it's not on dumb-init side (outside of program), but handling SIGTERM inside program.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants