Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Services] Restart DHCP-Relay service upon unexpected critical process exit. #3667

Merged
merged 7 commits into from
Nov 6, 2019
2 changes: 2 additions & 0 deletions dockers/docker-dhcp-relay/Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,7 @@ RUN apt-get clean -y && \

COPY ["docker_init.sh", "start.sh", "/usr/bin/"]
COPY ["docker-dhcp-relay.supervisord.conf.j2", "wait_for_intf.sh.j2", "/usr/share/sonic/templates/"]
COPY ["files/supervisor-proc-exit-listener", "/usr/bin"]
COPY ["critical_processes", "/etc/supervisor"]

ENTRYPOINT ["/usr/bin/docker_init.sh"]
1 change: 1 addition & 0 deletions dockers/docker-dhcp-relay/critical_processes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
isc-dhcp-relay
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@ logfile_maxbytes=1MB
logfile_backups=2
nodaemon=true

[eventlistener:supervisor-proc-exit-listener]
command=/usr/bin/supervisor-proc-exit-listener
events=PROCESS_STATE_EXITED
autostart=true
autorestart=unexpected

[program:start.sh]
command=/usr/bin/start.sh
priority=1
Expand Down
4 changes: 4 additions & 0 deletions files/build_templates/dhcp_relay.service.j2
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,16 @@ Description=DHCP relay container
Requires=updategraph.service swss.service teamd.service
After=updategraph.service swss.service teamd.service
Before=ntp-config.service
StartLimitIntervalSec=1200
StartLimitBurst=3

[Service]
User={{ sonicadmin_user }}
ExecStartPre=/usr/bin/{{ docker_container_name }}.sh start
ExecStart=/usr/bin/{{ docker_container_name }}.sh wait
ExecStop=/usr/bin/{{ docker_container_name }}.sh stop
Restart=always
RestartSec=30

[Install]
WantedBy=multi-user.target swss.service teamd.service
3 changes: 2 additions & 1 deletion files/scripts/supervisor-proc-exit-listener
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,10 @@ def main():

expected = int(payload_headers['expected'])
processname = payload_headers['processname']
groupname = payload_headers['groupname']

# If a critical process exited unexpectedly, terminate supervisor
if expected == 0 and processname in critical_processes:
if expected == 0 and processname in critical_processes or groupname in critical_processes:
MSG_FORMAT_STR = "Process {} exited unxepectedly. Terminating supervisor..."
msg = MSG_FORMAT_STR.format(payload_headers['processname'])
syslog.syslog(syslog.LOG_INFO, msg)
Expand Down
1 change: 1 addition & 0 deletions rules/docker-dhcp-relay.mk
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ SONIC_STRETCH_DBG_DOCKERS += $(DOCKER_DHCP_RELAY_DBG)
$(DOCKER_DHCP_RELAY)_CONTAINER_NAME = dhcp_relay
$(DOCKER_DHCP_RELAY)_RUN_OPT += --net=host --privileged -t
$(DOCKER_DHCP_RELAY)_RUN_OPT += -v /etc/sonic:/etc/sonic:ro
$(DOCKER_DHCP_RELAY)_FILES += $(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@ logfile_maxbytes=1MB
logfile_backups=2
nodaemon=true

[eventlistener:supervisor-proc-exit-listener]
command=/usr/bin/supervisor-proc-exit-listener
events=PROCESS_STATE_EXITED
autostart=true
autorestart=unexpected

[program:start.sh]
command=/usr/bin/start.sh
priority=1
Expand Down