Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[NAT] Exit from natmgrd correctly after receiving SIGTERM signal. (so…
…nic-net#2232) **What I did** Fix the issue when *natmgrd* remains running after receiving SIGTERM signal and performing clean-up. **Why I did it** Supervisord sends SIGTERM to all applications that it controls. Application after receiving the signal may optionally run a clean-up routine and should exit. natmngrd runs a cleanup from the signal handler but doesn't exit after that. After supervisord sends the signal it waits for 10 seconds (default timeout) to allow the application to exit correctly. If the application is not exiting in the given timeout supervisord kills the application. natmngrd is always killed by the supervisord. This affects the container restart time. Before the fix NAT container shutting down takes ~10 seconds. Docker kills the application with SIGKILL signal after timeout: ``` # time /usr/bin/nat.sh stop real 0m10.420s user 0m0.227s sys 0m0.046s Apr 18 10:46:16.881285 r-leopard-32 INFO nat#supervisord 2022-04-18 10:46:16,879 WARN received SIGTERM indicating exit request Apr 18 10:46:16.881285 r-leopard-32 INFO nat#supervisord 2022-04-18 10:46:16,880 INFO waiting for supervisor-proc-exit-listener, rsyslogd, natmgrd, natsyncd to die Apr 18 10:46:17.883936 r-leopard-32 INFO nat#supervisord 2022-04-18 10:46:17,883 INFO stopped: natsyncd (terminated by SIGTERM) Apr 18 10:46:17.883936 r-leopard-32 NOTICE nat#natmgrd: :- sigterm_handler: Got SIGTERM Apr 18 10:46:17.891989 r-leopard-32 INFO nat#/supervisord: natmgrd conntrack v1.4.5 (conntrack-tools): connection tracking table has been emptied. Apr 18 10:46:17.891989 r-leopard-32 NOTICE nat#natmgrd: :- sigterm_handler: Sending notification to orchagent to cleanup NAT entries in REDIS/ASIC Apr 18 10:46:19.895501 r-leopard-32 INFO nat#supervisord 2022-04-18 10:46:19,894 INFO waiting for supervisor-proc-exit-listener, rsyslogd, natmgrd to die Apr 18 10:46:22.898947 r-leopard-32 INFO nat#supervisord 2022-04-18 10:46:22,898 INFO waiting for supervisor-proc-exit-listener, rsyslogd, natmgrd to die Apr 18 10:46:25.903148 r-leopard-32 INFO nat#supervisord 2022-04-18 10:46:25,902 INFO waiting for supervisor-proc-exit-listener, rsyslogd, natmgrd to die Apr 18 10:46:26.115147 r-leopard-32 INFO dockerd[737]: time="2022-04-18T10:46:26.114201315Z" level=info msg="Container ec5804a8ccd413786392c27ac3e61d4dfe67c8e5558c91b6c6bf0712cf85d07a failed to exit within 10 seconds of signal 15 - using the force" ``` After the fix NAT container shutting down takes ~4 seconds. The application exits correctly after receiving SIGTERM signal: ``` # time /usr/bin/nat.sh stop real 0m4.166s user 0m0.219s sys 0m0.036s Apr 18 10:52:23.611991 r-leopard-32 INFO nat#supervisord 2022-04-18 10:52:23,610 WARN received SIGTERM indicating exit request Apr 18 10:52:23.611991 r-leopard-32 INFO nat#supervisord 2022-04-18 10:52:23,610 INFO waiting for supervisor-proc-exit-listener, rsyslogd, natmgrd, natsyncd to die Apr 18 10:52:23.613338 r-leopard-32 INFO nat#supervisord 2022-04-18 10:52:23,612 INFO stopped: natsyncd (terminated by SIGTERM) Apr 18 10:52:24.620815 r-leopard-32 INFO nat#/supervisord: natmgrd conntrack v1.4.5 (conntrack-tools): connection tracking table has been emptied. Apr 18 10:52:24.620815 r-leopard-32 NOTICE nat#natmgrd: :- cleanup: Sending notification to orchagent to cleanup NAT entries in REDIS/ASIC Apr 18 10:52:25.407307 r-leopard-32 INFO nat#supervisord 2022-04-18 10:52:25,406 INFO stopped: natmgrd (exit status 0) ``` **How I verified it** Stop NAT container. Check syslog whether natmgrd application exited correctly with return code '0'.
- Loading branch information