Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using command "reboot -f", fancontrol service will start failed in platform monitor docker. #975

Closed
wadelnn opened this issue Sep 21, 2017 · 2 comments
Assignees
Labels

Comments

@wadelnn
Copy link
Contributor

wadelnn commented Sep 21, 2017

Description

When using command "reboot -f", fancontrol service will start failed in platform monitor docker.

Steps to reproduce the issue:

  1. check fancontrol status (status in running)
    root@sonic:/# docker exec -it pmon service fancontrol status
  2. force reboot
    root@sonic:/# reboot -f
  3. check fancontrol status again (status failed)
    root@sonic:/# docker exec -it pmon service fancontrol status

Describe the results you received:
fancontrol status will be failed. Need to restart fancontrol service.

Describe the results you expected:
fancontrol status should still in running state.

Additional information you deem important (e.g. issue happens only occasionally):

Output of show version:

```
root@sonic:/home/admin# show version
SONiC Software Version: 201709.01
Distribution: Debian 8.9
Kernel: 3.16.0-4-amd64
Build commit: 48fd6fb
Build date: Wed Sep 20 08:35:00 UTC 2017
Built by: wadelnn@sonic

Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-brcm latest 8376adf32a72 318.4 MB
docker-orchagent-brcm latest daf63e4b4ce9 258.4 MB
docker-lldp-sv2 latest 9221ad7a1769 256.5 MB
docker-dhcp-relay latest c2bde5d68369 253.5 MB
docker-database latest b75f64ffd3f6 251.7 MB
docker-snmp-sv2 latest 74fd7c82e8de 291.3 MB
docker-teamd latest a0277c3a87fb 255.4 MB
docker-platform-monitor latest 8b964742f161 271 MB
docker-fpm-quagga latest f177ce9a9578 262 MB
```
@wadelnn
Copy link
Contributor Author

wadelnn commented Sep 21, 2017

The root cause is as follows.
When fancontrol service started, the fancontrol process will check the /var/run/fancontrol.pid is exist or not.
If fancontrol.pid exist, the service will exit to avoid duplicate process.
/var/run is tmpfs on host side. Files under this directory must be cleared (removed or truncated as appropriate) at the beginning of the boot process.
But in docker, /var/run is mount in root-aufs. It will not clear /var/run at the beginning in the platform-monitor docker.

@stcheng
Copy link
Contributor

stcheng commented Oct 1, 2017

I wonder if for each docker, we clean the /var/run folder intentionally during docker start up time.
Previously, we're facing the rsyslog pid issue in the same manner.

stcheng pushed a commit to stcheng/sonic-buildimage that referenced this issue Jul 19, 2019
swss:
[vxlanorch]: Allow ipv6 src ip for Vxlan tunnel creation (sonic-net#896)
[aclorch]: Allow DTEL drop actions in DTEL flow watchlist (sonic-net#915)
Fix typo in orchagent_restart_check from fasle to false. (sonic-net#923)
[sonic-swss]: Fix for FPM accept call failure in ARM arch (sonic-net#925)
Add retryCount option for orchagent_restart_check program. (sonic-net#833)
[vlan] Add pytest cases to validate nonexistent vlan behavior. (sonic-net#874)
[intfsorch] Wait for interface prior to prefix (sonic-net#796)
Set timer only when interval changes. Not in each firing of the timer. (sonic-net#945)
[test]: Fix set interface in configuration database (sonic-net#956)
[copporch]: Fix polymorphic type error (sonic-net#946)
[AclOrch]: Fix the acl mirror counter doubled by inactive mirror and active again (sonic-net#952)
[MirrorOrch]: Init the next hop ip with 0 instead of default constructor (sonic-net#953)
[portsorch]: Add reference count of port (sonic-net#962)
[mock_test]: Move mock tests into a separate folder to separate them from vs tests (sonic-net#950)
remove crm acl_counters when acl_table removed (sonic-net#918)
[aclorch]: Fix matching MIRROR_DSCP throws unnecessary errors (sonic-net#966)
[policerorch]: Fix return code comparison error (sonic-net#968)
[gitignore]: Add swss-dbg related files (sonic-net#967)
[vxlanmgrd]: Fix for vxlanmgrd cannot correctly work after config reload (sonic-net#934)
[vxlanorch]: Add extra info into NOTICE logs (sonic-net#891)
[test]: Add a neighbor entry with BCAST MAC and verify its ignored (sonic-net#955)
[copporch]: Fix copporch in DEL command (sonic-net#972)
[orchagent]: Fix crash during orchagent process exit (sonic-net#974)
[vnetorch]: Fix VNET orchagents order for warm-reboot flow (sonic-net#958)
[test]: Skip unstable test test_vnet_orch_1 (sonic-net#976)
[intfsorch]: Fix rif flex counter removal error (sonic-net#975)
Update tests README.md file
[aclorch]: Change CFG_ACL_TABLE_NAME to CFG_ACL_TABLE_TABLE_NAME (sonic-net#978)
[test]: Skip test_watermark.py::TestWatermark::test_lua_plugins (sonic-net#981)
[teamsyncd]: Add information for LAG membership changes (sonic-net#982)

common:
Add an assert to logger, which will log a message and abort. (sonic-net#286)
[test]: Add IpAddress::isZero() unit test (sonic-net#289)
do not abort when read timerfd return 0 and errno = 0 (sonic-net#291)
Add BGP_STATE_TABLE in stateDB (sonic-net#273)
[IpAddress]: add mcast scope on address and isFullMask method on prefix (sonic-net#285)
Add ignore Wshadow pragma to json.hpp (sonic-net#292)
[executor]: Fix Executor does not get correct priority saved in m_selectable (sonic-net#290)
[schema]: Remove duplicate STATE_MIRROR_SESSION_TABLE_NAME (sonic-net#294)
timerfd:read failure - Record in logs as error. (sonic-net#295)
[schema]: Change CFG_ACL_TABLE_NAME to CFG_ACL_TABLE_TABLE_NAME (sonic-net#296)
[schema]: Add PASS_THROUGH_ROUTE_TABLE to config and application db (sonic-net#297)

sairedis:
ARM32 bit fixes, for 64bit printf format specifier (sonic-net#468)
Reduce the timeout (GET_RESPONSE_TIMEOUT) from 6 minutes to 1 minute. (sonic-net#472)
Fixed config_syncd_barefoot function (sonic-net#474)
[syncd_init_common.sh] fix fast reboot backwards compatibility (sonic-net#480)
Add default bridge id for bridge port id of type PORT in virtual switch (sonic-net#473)
Fix a bug in parsing kernel argument of fast-reboot (sonic-net#482)
Add TimerWatchdog for monitoring long execution apis (sonic-net#469)
Add specific comparison logic for tunnel map (sonic-net#475)
[vslib] add ACL action capabilities support (sonic-net#481)
Per buffer pool watermark polling mode (sonic-net#485)
Add specific comparison logic for ACL counter (sonic-net#484)
Process flex counters requests in separate thread (sonic-net#483)
Make sairedis/syncd synchronous (sonic-net#476)
Fixed conditional operator. (sonic-net#487)

Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
lguohan pushed a commit that referenced this issue Jul 20, 2019
swss:
[vxlanorch]: Allow ipv6 src ip for Vxlan tunnel creation (#896)
[aclorch]: Allow DTEL drop actions in DTEL flow watchlist (#915)
Fix typo in orchagent_restart_check from fasle to false. (#923)
[sonic-swss]: Fix for FPM accept call failure in ARM arch (#925)
Add retryCount option for orchagent_restart_check program. (#833)
[vlan] Add pytest cases to validate nonexistent vlan behavior. (#874)
[intfsorch] Wait for interface prior to prefix (#796)
Set timer only when interval changes. Not in each firing of the timer. (#945)
[test]: Fix set interface in configuration database (#956)
[copporch]: Fix polymorphic type error (#946)
[AclOrch]: Fix the acl mirror counter doubled by inactive mirror and active again (#952)
[MirrorOrch]: Init the next hop ip with 0 instead of default constructor (#953)
[portsorch]: Add reference count of port (#962)
[mock_test]: Move mock tests into a separate folder to separate them from vs tests (#950)
remove crm acl_counters when acl_table removed (#918)
[aclorch]: Fix matching MIRROR_DSCP throws unnecessary errors (#966)
[policerorch]: Fix return code comparison error (#968)
[gitignore]: Add swss-dbg related files (#967)
[vxlanmgrd]: Fix for vxlanmgrd cannot correctly work after config reload (#934)
[vxlanorch]: Add extra info into NOTICE logs (#891)
[test]: Add a neighbor entry with BCAST MAC and verify its ignored (#955)
[copporch]: Fix copporch in DEL command (#972)
[orchagent]: Fix crash during orchagent process exit (#974)
[vnetorch]: Fix VNET orchagents order for warm-reboot flow (#958)
[test]: Skip unstable test test_vnet_orch_1 (#976)
[intfsorch]: Fix rif flex counter removal error (#975)
Update tests README.md file
[aclorch]: Change CFG_ACL_TABLE_NAME to CFG_ACL_TABLE_TABLE_NAME (#978)
[test]: Skip test_watermark.py::TestWatermark::test_lua_plugins (#981)
[teamsyncd]: Add information for LAG membership changes (#982)

common:
Add an assert to logger, which will log a message and abort. (#286)
[test]: Add IpAddress::isZero() unit test (#289)
do not abort when read timerfd return 0 and errno = 0 (#291)
Add BGP_STATE_TABLE in stateDB (#273)
[IpAddress]: add mcast scope on address and isFullMask method on prefix (#285)
Add ignore Wshadow pragma to json.hpp (#292)
[executor]: Fix Executor does not get correct priority saved in m_selectable (#290)
[schema]: Remove duplicate STATE_MIRROR_SESSION_TABLE_NAME (#294)
timerfd:read failure - Record in logs as error. (#295)
[schema]: Change CFG_ACL_TABLE_NAME to CFG_ACL_TABLE_TABLE_NAME (#296)
[schema]: Add PASS_THROUGH_ROUTE_TABLE to config and application db (#297)

sairedis:
ARM32 bit fixes, for 64bit printf format specifier (#468)
Reduce the timeout (GET_RESPONSE_TIMEOUT) from 6 minutes to 1 minute. (#472)
Fixed config_syncd_barefoot function (#474)
[syncd_init_common.sh] fix fast reboot backwards compatibility (#480)
Add default bridge id for bridge port id of type PORT in virtual switch (#473)
Fix a bug in parsing kernel argument of fast-reboot (#482)
Add TimerWatchdog for monitoring long execution apis (#469)
Add specific comparison logic for tunnel map (#475)
[vslib] add ACL action capabilities support (#481)
Per buffer pool watermark polling mode (#485)
Add specific comparison logic for ACL counter (#484)
Process flex counters requests in separate thread (#483)
Make sairedis/syncd synchronous (#476)
Fixed conditional operator. (#487)

Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
madhanmellanox pushed a commit to madhanmellanox/sonic-buildimage that referenced this issue Mar 23, 2020
rifFlexCounter hasn't removed when rif removed, syncd runtime error happens.
abdosi added a commit that referenced this issue Aug 9, 2020
As part of this commit and previous commit ff6cb6c
sonic-utilities submodule for 201911 has been updated to take following
changes:

 Add support for QSFP-DD cables on 'show' command (#989)
 [show] Fix for 'trunk' PortChannel reported as 'routed' port (#1002)
Enable HW watchdog before fast-reboot (#977)
 [filter-fdb] Check VLAN Presence When Filter FDB (#957) (#975)
[filter-fdb] Fix For Vlan Defined With No CIDR (#976)
 [show/config]: combine feature and container feature cli (#1015)
Pterosaur added a commit that referenced this issue Feb 23, 2022
Signed-off-by: Ze Gan <ganze718@gmail.com>

b9337dc (HEAD, origin/master, origin/HEAD) [vslib]: Fix MACsec bug in SCI and XPN (#1003)
edbceb9 [syncd][vslib] Keep new warm boot discovered SERDES objects (#985)
af5c156 Fix build issues on gcc-10 (#999)
1445cd5 update SAI submoule (#1001)
48fe704 [ci] pipeline fixes for VS test (#1002)
f484cf9 Enable SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL attribute (#975)
5d0b22d Enable SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL attribute (#975)
1b8ce97 (origin/202111) [pipeline] Download swss common artifact in a separated directory (#995)
7a2e096 Change sonic-buildimage.vs artifact source from CI build to official build. (#992)
AidanCopeland pushed a commit to Metaswitch/sonic-buildimage that referenced this issue Apr 14, 2022
…net#975)

Enable SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL attribute for all platforms based on capability

Signed-off-by: Thushar Gowda <24815472+tbgowda@users.noreply.github.com>
AidanCopeland pushed a commit to Metaswitch/sonic-buildimage that referenced this issue Apr 14, 2022
…net#975)

Enable SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL attribute for all platforms based on capability

Signed-off-by: Thushar Gowda <24815472+tbgowda@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants