Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zebra process crashes intermittently during 'config reload' on the DUT line cards #36

Open
sanjair-git opened this issue Jul 24, 2023 · 4 comments

Comments

@sanjair-git
Copy link

  • When reporting a crash, provide a backtrace
  • When pasting configs, logs, shell output, backtraces, and other large chunks of text use Markdown code blocks
  • Include the FRR version; if you built from Git, please provide the commit hash
  • Write your issue in English

Describe the bug

On a T2 chassis line card, when we do 'sudo config reload -y', we see 'zebra' process getting crashed and generates a core. We see this issue intermittently happening. (_~ approx once in 30 attempts or so_)

We have started seeing the issue from this commit,

sonic-buildimage-msft commit:
Azure/sonic-buildimage-msft@6f19e12

Following logs are seen on the bgp docker, when the crash is happening.

2023-07-09 13:59:40,064 INFO exited: zebra (terminated by SIGSEGV (core dumped); not expected)
2023-07-11 19:39:22,156 INFO exited: zebra (terminated by SIGSEGV (core dumped); not expected)

Crash logs:

image

Attached the zebra core generated and the frr logs for reference.
zebra.1689104360.44.0.core.gz
frr.zip

Actual Behaviour:

  • Zebra process under bgp docker gets crashed.
  • Core generated

We had already raised an issue under sonic-buildimage regarding this crash, please take a look at this,
sonic-net/sonic-buildimage#15803
15803

To Reproduce
Steps to reproduce the behavior:
On any T2 chassis line card, do 'sudo config reload -y' for multiple times.

Expected behavior

  • 'sudo config reload' on DUT line cards, should not cause any issue. And the line cards should come up fine with all bgp neighbors established without any crash/core files.

Screenshots
If applicable, add screenshots to help explain your problem.

Versions

  • OS Kernel: [e.g. Linux, OpenBSD, etc] [version]
  • FRR Version [version]
admin@ixre-egl-board1:~$ show version

SONiC Software Version: SONiC.HEAD.489499-msft-2205-ndk-d963ac161
SONiC OS Version: 11
Distribution: Debian 11.7
Kernel: 5.10.0-18-2-amd64
Build commit: d963ac161
Build date: Fri Jul  7 18:18:51 UTC 2023
Built by: gitlab-runner@sonic-bld2

Platform: x86_64-nokia_ixr7250e_36x400g-r0
HwSKU: Nokia-IXR7250E-36x100G
ASIC: broadcom
ASIC Count: 2
Serial Number: EAG2-04-210
Model Number: N/A
Hardware Revision: 56
Uptime: 15:45:52 up 1 day, 12:15,  3 users,  load average: 1.56, 1.54, 1.59
Date: Wed 12 Jul 2023 15:45:52

Additional context
Add any other context about the problem here.

@mlok-nokia
Copy link

After we checked the previous test history, we found this crash is shown in April testcase run.

@saksarav-nokia
Copy link

Attaching the symbol file and core file
zebra.gz
zebra.1689971652.43.1.core.gz

@saksarav-nokia
Copy link

route.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants