Nomad stop exits before allocations have exited #1225

ghost · 2016-06-01T14:36:35Z

Nomad version

Nomad v0.3.2

Operating system and Environment details

Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29) x86_64 GNU/Linux

Issue

I'm experiencing issues where nomad stop exits before the docker container has exited. This means that when we run nomad run the nomad client experiences a port collision because the old container is still running. This appears to be related to containers that cannot or do not exit from SIGTERM and require a subsequent SIGKILL after the Nomad exit timeout (as shown in the logs below).

Nomad Client logs (if appropriate)

Jun  1 14:15:19 nomad[1335]: 2016/06/01 14:15:19 [DEBUG] http: Request /v1/jobs?prefix=example (2.572682ms)
Jun  1 14:15:19 nomad[1335]: 2016/06/01 14:15:19 [DEBUG] http: Request /v1/job/example (2.090988ms)
Jun  1 14:15:19 nomad[1335]: 2016/06/01 14:15:19 [DEBUG] http: Request /v1/job/example (15.035108ms)
Jun  1 14:15:19 nomad[1335]: 2016/06/01 14:15:19 [DEBUG] http: Request /v1/evaluation/7db80b44-dcc9-79d6-d4ee-1c36832de9c0 (1.28188ms)
Jun  1 14:15:19 nomad[1335]: 2016/06/01 14:15:19 [DEBUG] http: Request /v1/evaluation/7db80b44-dcc9-79d6-d4ee-1c36832de9c0/allocations (920.807µs)
Jun  1 14:15:19 nomad[1335]: 2016/06/01 14:15:19 [DEBUG] client: updated allocations at index 10059 (pulled 1) (filtered 19)
Jun  1 14:15:19 nomad[1335]: 2016/06/01 14:15:19 [DEBUG] client: allocs: (added 0) (removed 0) (updated 1) (ignore 19)
Jun  1 14:15:19 nomad[1335]: 2016/06/01 14:15:19 [DEBUG] client: updated allocations at index 10061 (pulled 0) (filtered 20)
Jun  1 14:15:20 nomad[1335]: 2016/06/01 14:15:20 [DEBUG] client: updated allocations at index 10062 (pulled 0) (filtered 20)
Jun  1 14:15:20 nomad[1335]: 2016/06/01 14:15:20 [DEBUG] http: Request /v1/evaluation/7db80b44-dcc9-79d6-d4ee-1c36832de9c0 (1.850422ms)
Jun  1 14:15:20 nomad[1335]: 2016/06/01 14:15:20 [DEBUG] http: Request /v1/evaluation/7db80b44-dcc9-79d6-d4ee-1c36832de9c0/allocations (938.222µs)


// Nomad stop returns here, but container is still running


Jun  1 14:15:24 docker[1264]: time="2016-06-01T14:15:24.739442325Z" level=info msg="Container cfd81a52b31dbf6d8dc9068e6cc8da4f5fda015db5e9995c9c84d1df64fb5023 failed to exit within 5 seconds of signal 15 - using the force"
Jun  1 14:15:24 docker[1264]: time="2016-06-01T14:15:24.799329877Z" level=error msg="restartmanager canceled"
Jun  1 14:15:24 kernel: [ 5528.132996] docker0: port 1(veth75e91bf) entered disabled state
Jun  1 14:15:24 systemd-sysctl[5158]: Overwriting earlier assignment of net/ipv4/ip_forward in file '/etc/sysctl.d/99-salt.conf'.
Jun  1 14:15:24 kernel: [ 5528.145357] docker0: port 1(veth75e91bf) entered disabled state
Jun  1 14:15:24 kernel: [ 5528.153813] device veth75e91bf left promiscuous mode
Jun  1 14:15:24 kernel: [ 5528.159586] docker0: port 1(veth75e91bf) entered disabled state
Jun  1 14:15:24 nomad[1335]: 2016/06/01 14:15:24 [INFO] driver.docker: stopped container cfd81a52b31dbf6d8dc9068e6cc8da4f5fda015db5e9995c9c84d1df64fb5023
Jun  1 14:15:24 nomad[1335]: 2016/06/01 14:15:24 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 20)
Jun  1 14:15:24 consul[662]: 2016/06/01 14:15:24 [INFO] agent: Deregistered service 'nomad-registered-service-b7f6b211-9b7b-926c-0f2c-6c0dfee51a33-example-669b212cd4a8b30f21f6558ee09b91d5546c64cd'
Jun  1 14:15:24 nomad[1335]: 2016/06/01 14:15:24 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 20)
Jun  1 14:15:24 nomad[1335]: 2016/06/01 14:15:24 [DEBUG] plugin: /usr/local/bin/nomad: plugin process exited
Jun  1 14:15:24 docker[1264]: time="2016-06-01T14:15:24.915901330Z" level=error msg="Handler for POST /containers/cfd81a52b31dbf6d8dc9068e6cc8da4f5fda015db5e9995c9c84d1df64fb5023/stop returned error: Container cfd81a52b31dbf6d8dc9068e6cc8da4f5fda015db5e9995c9c84d1df64fb5023 is already stopped"
Jun  1 14:15:25 nomad[1335]: 2016/06/01 14:15:25 [DEBUG] client: updated allocations at index 10063 (pulled 0) (filtered 20)
Jun  1 14:15:25 nomad[1335]: 2016/06/01 14:15:25 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 20)

The text was updated successfully, but these errors were encountered:

diptanu · 2016-06-03T16:05:34Z

@grobinson-blockchain That's how the docker driver works. First, it sends the SIGTERM signal and after the timeout, it sends the SIGKILL signal to the container if it's still running.

nomad stop is not supposed to block until the container exits, it simply changes the desired state of the job and the client takes soon after that. The scheduler doesn't think the resources used by the old allocations has been freed up until the client informs the scheduler that the task has been killed successfully.

ghost · 2016-06-03T20:20:15Z

nomad stop is not supposed to block until the container exits, it simply changes the desired state of the job and the client takes soon after that.

@diptanu it's just that the documentation says otherwise (https://www.nomadproject.io/docs/commands/stop.html):

Upon successful deregistration, an interactive monitor session will start to display log lines as the job unwinds its allocations and completes shutting down. The monitor will exit once all allocations are stopped and the job has reached a terminal state.

diptanu · 2016-06-03T20:58:46Z

@grobinson-blockchain Sorry about that, we will update the docs.

What actually happens here is that Nomad creates an evaluation when the stop api is called, and waits until the scheduler processes the evaluation. Once the evaluation is processed and the allocation's desired status has been updated the stop returns.

camerondavison · 2016-06-07T19:03:16Z

As far as the "experiences a port collision because the old container is still running" that seems like a bug? If the port was allocated as part of the nomad task. I guess it is possible that it was not. That bug though seems related to #1191

ghost · 2016-06-08T09:58:02Z

@a86c6f7964 yes, the port allocation was from a previous allocation of the same Nomad job. I'm simply updating the docker image to run a later build.

dadgar · 2016-11-04T22:25:55Z

@grobinson-blockchain Thank you!

github-actions · 2022-12-18T02:13:42Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

diptanu added the stage/waiting-reply label Jun 3, 2016

diptanu added theme/docs Documentation issues and enhancements type/bug labels Jun 3, 2016

sethvargo removed the stage/waiting-reply label Oct 28, 2016

dadgar mentioned this issue Nov 4, 2016

Fixes to stop command #1939

Merged

dadgar closed this as completed in #1939 Nov 4, 2016

github-actions bot locked as resolved and limited conversation to collaborators Dec 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nomad stop exits before allocations have exited #1225

Nomad stop exits before allocations have exited #1225

ghost commented Jun 1, 2016 •

edited by sethvargo

Loading

diptanu commented Jun 3, 2016

ghost commented Jun 3, 2016 •

edited by ghost

Loading

diptanu commented Jun 3, 2016

camerondavison commented Jun 7, 2016

ghost commented Jun 8, 2016

dadgar commented Nov 4, 2016

github-actions bot commented Dec 18, 2022

Nomad stop exits before allocations have exited #1225

Nomad stop exits before allocations have exited #1225

Comments

ghost commented Jun 1, 2016 • edited by sethvargo Loading

Nomad version

Operating system and Environment details

Issue

Nomad Client logs (if appropriate)

diptanu commented Jun 3, 2016

ghost commented Jun 3, 2016 • edited by ghost Loading

diptanu commented Jun 3, 2016

camerondavison commented Jun 7, 2016

ghost commented Jun 8, 2016

dadgar commented Nov 4, 2016

github-actions bot commented Dec 18, 2022

ghost commented Jun 1, 2016 •

edited by sethvargo

Loading

ghost commented Jun 3, 2016 •

edited by ghost

Loading