Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[warm-reboot] [TH3] test_warm_reboot fails and orchagent crash seen on 9332f platform #8397

Closed
tjchadaga opened this issue Aug 9, 2021 · 0 comments

Comments

@tjchadaga
Copy link
Contributor

Description

Warm-reboot test fails at APPLY_VIEW due to SAI_OBJECT_TYPE_TUNNEL_TERM_TABLE_ENTRY create error. The following log is seen in syslog:

ERR syncd#syncd: :- executeOperationsOnAsic: Error while executing asic operations, ASIC is in inconsistent state: :- asic_process_event: failed to execute api: create, key: SAI_OBJECT_TYPE_TUNNEL_TERM_TABLE_ENTRY:oid:0x2b00000000080f, status: SAI_STATUS_NOT_SUPPORTED

Sai logs:

2021-08-04.22:02:03.384371|a|APPLY_VIEW
2021-08-04.22:02:04.534238|A|SAI_STATUS_FAILURE
2021-08-04.22:02:04.534402|n|switch_shutdown_request|{"switch_id":"oid:0x21000000000000"}|

This causes the following orchagent crash:

Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fe22fa01d57 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fe22f5ffbc0 (LWP 43))]
(gdb) bt
#0 0x00007fe22fa01d57 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fe22fa01eba in exit () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x000056170b806b7b in syncd_apply_view () at main.cpp:108
#3 0x000056170b810441 in OrchDaemon::warmRestoreAndSyncUp (this=0x56170d644640) at orchdaemon.cpp:657
#4 0x000056170b813306 in OrchDaemon::init (this=this@entry=0x56170d644640) at orchdaemon.cpp:482
#5 0x000056170b7eebef in main (argc=, argv=) at main.cpp:411
(gdb)

Steps to reproduce the issue:

  1. Run test_reboot::test_warm_reboot on 9332f platform
    or
  2. Execute warm-reboot manually on 9332f

Describe the results you received:

  1. Test fails and swss docker is not running on the setup after warm reboot
  2. Orchagent core with the above BT is seen and the tunnel termination error is seen in syslog

Describe the results you expected:

No core seen and all critical containers running after warm-reboot

Output of show version:

SONiC Software Version: SONiC.20201231.14
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: 59ac7da661
Build date: Mon Aug  9 00:04:06 UTC 2021
Built by: AzDevOps@sonic-int-build-workers-0002JO

Platform: x86_64-dellemc_z9332f_d1508-r0
HwSKU: DellEMC-Z9332f-O32
ASIC: broadcom
ASIC Count: 1
Serial Number: TH04CN21CET0004K0214
Uptime: 19:37:56 up  1:11,  1 user,  load average: 0.59, 0.59, 0.58

BRCM SAI ver: [4.3.5.1], OCP SAI ver: [1.7.1], SDK ver: [sdk-6.5.21]

Output of show techsupport:


Additional information you deem important (e.g. issue happens only occasionally):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant