Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue: restart thermalctld too quick cause supervisord never restart it again #29

Closed
wants to merge 1 commit into from

Conversation

Junchao-Mellanox
Copy link
Owner

- Why I did it

Found error logs in syslog:

Oct 10 12:41:09.148655 arc-switch1029 INFO pmon#supervisord 2020-10-10 12:41:07,689 INFO gave up: thermalctld entered FATAL state, too many start retries too quickly

The issue is related to the "startsecs" configuration of thermalctld in /etc/supervisor/conf.d/supervisord.conf. The current configuration setting the "startsecs" to 10, which means that it require thermalctld process running at least 10 seconds or supervisord will not restart it after it exiting even if the exit code is expected.

See the official document for "startsecs" at http://supervisord.org/configuration.html:

startsecs

The total number of seconds which the program needs to stay running after a startup to consider the start successful. If the program does not stay up for this many seconds after it has started, even if it exits with an “expected” exit code (see exitcodes), the startup will be considered a failure. Set to 0 to indicate that the program needn’t stay running for any particular amount of time.

- How I did it

The fix is to change the "startsecs" configuration from 10 to 0

- How to verify it

Manual test

- Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@Junchao-Mellanox
Copy link
Owner Author

sonic-net#5633

Junchao-Mellanox pushed a commit that referenced this pull request Jan 22, 2021
* 3b330db4a 2021-01-18 | [build]: Fix build error when compiling for armhf (32-bit) (#30) (HEAD, origin/master, origin/HEAD, master) [dflynn-Nokia]
* 56aaa225b 2021-01-16 | [ci]: add pipeline for armhf and arm64 (#29) [lguohan]
* 90da6141c 2021-01-12 | [ci]: propagate the correct error code the next step (#27) [lguohan]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
Junchao-Mellanox pushed a commit that referenced this pull request Jul 1, 2021
Advance submodule update with the following changes:
4475750 Config reload fix (#29)
cf60d5e [ci]: add proper azp (#26)
f0fbfe7 [CI] Set up CI with Azure Pipelines (#25)
879d7bd Include port default fec configuration to be included in ZTP configuration (#24)
a6ae955 Add a pre-defined plugin to download a list of files (#23)
6f0305b [MultiDB] Add multidb support to sonic-ztp (#16)
Junchao-Mellanox pushed a commit that referenced this pull request Jan 9, 2023
Update dhcprelay submodule to include the following commits
4bf1868 fix relay-reply dhcpv6 packet counter issue (#29)
Junchao-Mellanox pushed a commit that referenced this pull request Jan 29, 2023


advance dhcp relay for 202211

4bf1868 - (HEAD, origin/master, origin/HEAD, master) fix relay-reply dhcpv6 packet counter issue (add support for a7050 qx32 platform #29) (2 weeks ago) [jcaiMR]
9b30690 - fix handleSwssNotification crash in dhcp6relay (Add libnl-nf-3-200 to docker-team #28) (4 weeks ago) [jcaiMR]
047afb7 - Fix multiple vlan issue (Failure trying to run: chroot /sonic-buildimage/fsroot mount -t proc proc /proc #27) (4 weeks ago) [jcaiMR]
ff6bec3 - Made the Error log informative (add python-tenjin as build dependency for p4-switch #22) (5 weeks ago) [Vivek]
2fbe729 - disable cfg dynamic change (p4: fix build dependency for python-p4c-bm #25) (6 weeks ago) [jcaiMR]
13d0805 - Use github code scanning instead of LGTM (Removed sx-libnl from Mellanox containers dependencies. #26) (6 weeks ago) [Liu Shilong]
1e846f6 - Fix packet range check for relay-reply packets (update sonic-swss and p4-switch submodule to fix docker sonic p4 bug #21) (7 weeks ago) [kellyyeh]
4d19e13 - Add unittest infrastructure (Cavium customization for docker containers #5) (8 weeks ago) [kellyyeh]
7f4fdab - fix packet range check issue (Makefile: add build dependency for python-p4c-bm #20) (9 weeks ago) [jcaiMR]
257ecdf - Add client packet UDP header length check (change port_config.ini directory for s6000 #19) (2 months ago) [kellyyeh]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants