Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S6100- T0- Master - Critical processes doesn't run in the latest master - 306 #4755

Closed
mini-nair-dell opened this issue Jun 11, 2020 · 3 comments

Comments

@mini-nair-dell
Copy link

mini-nair-dell commented Jun 11, 2020

I see that many critical process gets exited or not running in the latest master image – 306

This issue is seen after running the script - https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/test/tasks/warm-reboot-fib.yml

A snippet from the syslog :

get_all#012 raise UnavailableDataError(message, _hash)#012swsssdk.exceptions.UnavailableDataError: Key 'b'COUNTERS:oid:0x100000000000a'' unavailable in database '2'
Jun 12 03:31:16.324156 sonic-s6100-07 ERR monit[508]: 'telemetry' process is not running
Jun 12 03:31:16.342013 sonic-s6100-07 ERR monit[508]: 'dsserve' process is not running
Jun 12 03:31:16.360510 sonic-s6100-07 ERR monit[508]: 'snmp_subagent' process is not running
Jun 12 03:32:03.227317 sonic-s6100-07 ERR swss#orchagent: :- wait: SELECT operation result: TIMEOUT on getresponse
Jun 12 03:32:03.227317 sonic-s6100-07 ERR swss#orchagent: :- wait: failed to get response for getresponse
Jun 12 03:32:03.227317 sonic-s6100-07 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 56 , rv:-1
Jun 12 03:32:16.401886 sonic-s6100-07 ERR monit[508]: 'telemetry' process is not running
Jun 12 03:32:16.419841 sonic-s6100-07 ERR monit[508]: 'dsserve' process is not running
Jun 12 03:32:16.438331 sonic-s6100-07 ERR monit[508]: 'snmp_subagent' process is not running

+++++++++++++++++

For example wrt pmon docker, I see that xcvrd and syseeprom are not running.

roocat /var/log/syslog docker exec -it pmon bash
root@sonic-s6100-07:/#
root@sonic-s6100-07:/#
root@sonic-s6100-07:/#
root@sonic-s6100-07:/#
root@sonic-s6100-07:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 05:39 pts/0 00:00:02 /usr/bin/python /usr/bin/supervi
root 19 1 0 05:39 pts/0 00:00:00 python /usr/bin/supervisor-proc-
root 23 1 0 05:39 pts/0 00:00:00 /usr/sbin/rsyslogd -n -iNONE
root 43 1 0 05:39 ? 00:00:00 /usr/sbin/sensord -f daemon
root 89 0 0 06:16 pts/1 00:00:00 bash
root 97 89 0 06:16 pts/1 00:00:00 ps -ef

root@sonic-s6100-07:/# xcvrd &
[1] 109
root@sonic-s6100-07:/# Traceback (most recent call last):
File "/usr/bin/xcvrd", line 1169, in
main()
File "/usr/bin/xcvrd", line 1166, in main
xcvrd.run()
File "/usr/bin/xcvrd", line 1130, in run
self.init()
File "/usr/bin/xcvrd", line 1092, in init
state_db = daemon_base.db_connect("STATE_DB")
File "/usr/local/lib/python2.7/dist-packages/sonic_daemon_base/daemon_base.py", line 37, in db_connect
REDIS_TIMEOUT_MSECS,
NameError: global name 'REDIS_TIMEOUT_MSECS' is not defined

[1]+ Exit 1 xcvrd
root@sonic-s6100-07:/#
root@sonic-s6100-07:/# ps -aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.1 0.2 59068 21456 pts/0 Ss+ 05:39 0:02 /usr/bin/python
root 19 0.0 0.1 51636 15988 pts/0 S 05:39 0:00 python /usr/bin
root 23 0.0 0.0 250140 3056 pts/0 Sl 05:39 0:00 /usr/sbin/rsysl
root 43 0.0 0.0 133252 1088 ? Ss 05:39 0:00 /usr/sbin/senso
root 103 0.0 0.0 18204 3380 pts/1 Ss 06:21 0:00 bash
root 117 0.0 0.0 36636 2836 pts/1 R+ 06:21 0:00 ps -aux

Thanks
Mini

@mini-nair-dell
Copy link
Author

Looks like issue can be fixed by initializing REDIS_TIMEOUT_MSECS=0 in daemon_base.py.
• After making the above change, critical processes are running.

Also, looks like this occurred due to change made in #4549

@mini-nair-dell
Copy link
Author

@mini-nair-dell
Copy link
Author

The issue is fixed in the build- 309, and not further seen

Thanks
Mini

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant