-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
syncd crash and hung seen with warm-reboot and fast-reboot on T0 topology- HEAD.253-2872d802 #3934
Comments
The issue is seen with the latest Jenkins Master image : #202 |
The fast-reboot and warm-reboot stucks with the latest master. And the below cores seen as well. Image - HEAD.209-b8561545 Thanks |
Couldnt verify this with the latest master, due to orchagent crash. |
The issue is still seen with the latest master HEAD.253-2872d802. Logs:
root@sonic-s6100-07:~# cd /var/core Thanks |
There is a Broadcom SAI issue with 3.5.3.3, causing syncd to crash during fast/warm reboot. Please move on to use latest build from master (I tried 259). With this build. I can see that the fast reboot shutdown no longer generate sycnd cores. With this build. warm reboot still generating core during pre-shutdown. This still looks like SAI issue. Will follow up with Broadcom. |
I tested this on the image - 259, and could see syncd crash on both warm-reboot and fast-reboot. The switch gets stuck for both the reboots, and can be recovered only by a powercycle root@sonic-s6100-07:/var/core# ls -l root@sonic-s6100-07:/var/core# show ver LOgs: root@sonic-s6100-07:/var/core# warm-reboot -vvv
root@sonic-s6100-07:/var/core# fast-reboot
Thanks |
@rlhui please arrange update Broadcom SAI in master branch |
We tested the build (300) with the SAI merge, but could see that the orchagent process doesnt run.
From syslogs : Jun 2 05:28:20.780311 sonic-s6100-07 ERR monit[499]: 'orchagent' process is not running root@sonic-s6100-07:~# show logging|grep -B 10 libprotobuf|tail -2 Jun 2 05:26:18.048129 sonic-s6100-07 INFO syncd#supervisord: syncd /usr/bin/syncd: Attaching the syslogs Thanks |
The issue is fixed in 201911 - 88 build and the master image - 306. The warm-reboot and fastboot works fine Thanks |
Description
+++++++++++++++
Pls find the logs below. The issue is not see in the master image 154
Syslog snippet:
Dec 20 06:15:56.828794 sonic-s6100-07 ERR swss#orchagent: :- sai_redis_internal_notify_syncd: notify syncd failed to get response result from select: 2
Dec 20 06:15:56.828794 sonic-s6100-07 ERR swss#orchagent: :- sai_redis_internal_notify_syncd: notify syncd failed to get response
Dec 20 06:15:56.828894 sonic-s6100-07 ERR swss#orchagent: :- sai_redis_notify_syncd: notify syncd failed: SAI_STATUS_FAILURE
Dec 20 06:15:56.828894 sonic-s6100-07 ERR swss#orchagent: :- initSaiRedis: Failed to notify syncd INIT_VIEW, rv:-1
Dec 20 06:15:56.829618 sonic-s6100-07 INFO swss#supervisord: orchagent terminate called without an active exception
Dec 20 06:15:58.010736 sonic-s6100-07 INFO swss#supervisor-proc-exit-listener: Process orchagent exited unxepectedly. Terminating supervisor...
Dec 20 06:15:58.571107 sonic-s6100-07 INFO swss.sh[1708]: No longer waiting on container 'syncd'
Dec 20 06:15:58.604890 sonic-s6100-07 NOTICE root: Stopping swss service...
Dec 20 06:15:58.612537 sonic-s6100-07 NOTICE root: Locking /tmp/swss-syncd-lock from swss service
root@sonic-s6100-07:/var/core# warm-reboot -vvv
Fri Dec 20 06:12:23 UTC 2019 Pausing orchagent ...
Fri Dec 20 06:12:23 UTC 2019 Stopping radv ...
Fri Dec 20 06:12:24 UTC 2019 Stopping bgp ...
Fri Dec 20 06:12:24 UTC 2019 Stopped bgp ...
Fri Dec 20 06:12:27 UTC 2019 Initialize pre-shutdown ...
Fri Dec 20 06:12:28 UTC 2019 Requesting pre-shutdown ...
Fri Dec 20 06:12:29 UTC 2019 Waiting for pre-shutdown ...
Fri Dec 20 06:16:20 UTC 2019 Syncd pre-shutdown failed: requesting ...
Fri Dec 20 06:16:20 UTC 2019 warm-reboot failure (11) cleanup ...
Fri Dec 20 06:16:21 UTC 2019 Cancel warm-reboot: code (1)
Core files :
root@sonic-s6100-07:/var/core# ls -ltr
total 10568
-rw-rw-rw- 1 root root 10261200 Dec 20 08:54 syncd.1576832093.28.core.gz
-rw-rw-rw- 1 root root 278329 Dec 20 08:56 orchagent.1576832194.45.core.gz
-rw-rw-rw- 1 root root 278347 Dec 20 08:58 orchagent.1576832301.47.core.gz
root@sonic-s6100-07:/var/core#
root@sonic-s6100-07:/var/core# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7b13c13d2fe1 docker-dhcp-relay-dbg:latest "/usr/bin/docker_ini…" 3 hours ago Up 3 hours dhcp_relay
6ef8beec5762 docker-syncd-brcm-dbg:latest "/usr/bin/supervisord" 3 hours ago Up 3 hours syncd
fcedb3fa4cf6 docker-teamd-dbg:latest "/usr/bin/supervisord" 3 hours ago Up 3 hours teamd
689537cc97d1 docker-platform-monitor-dbg:latest "/usr/bin/docker_ini…" 3 hours ago Up 3 hours pmon
8cb6929f9659 docker-fpm-frr-dbg:latest "/usr/bin/supervisord" 3 hours ago Up 3 hours bgp
8934c8414ccd docker-database-dbg:latest "/usr/local/bin/dock…" 3 hours ago Up 3 hours database
root@sonic-s6100-07:/var/core#
Attached:
Fast-reboot
+++++++++
root@sonic-s6100-07:~# fast-reboot -vvv
Fri Dec 20 12:08:14 UTC 2019 Stopping radv ...
Fri Dec 20 12:08:15 UTC 2019 Stopping bgp ...
Fri Dec 20 12:08:16 UTC 2019 Stopped bgp ...
Fri Dec 20 12:08:17 UTC 2019 Stopping teamd ...
Fri Dec 20 12:08:18 UTC 2019 Stopped teamd ...
Fri Dec 20 12:08:29 UTC 2019 Stopping syncd ...
Fri Dec 20 12:08:29 UTC 2019 Stopped syncd ...
Fri Dec 20 12:08:29 UTC 2019 Stopping all remaining containers ...
Fri Dec 20 12:08:30 UTC 2019 Stopped all remaining containers ...
Fri Dec 20 12:08:32 UTC 2019 Rebooting with /sbin/kexec -e to SONiC-OS-HEAD.157-dirty-20191219.005759 ...
Thanks
Mini
The text was updated successfully, but these errors were encountered: