Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TH3] dchpmon crashed while running advanced reboot testcase. #8932

Closed
gechiang opened this issue Oct 8, 2021 · 2 comments
Closed

[TH3] dchpmon crashed while running advanced reboot testcase. #8932

gechiang opened this issue Oct 8, 2021 · 2 comments
Assignees
Labels
Triaged this issue has been triaged

Comments

@gechiang
Copy link
Collaborator

gechiang commented Oct 8, 2021

Description

Observed the following on a TH3 T0 device while running nightly test executing advanced reboot testcase:

platform_tests/test_advanced_reboot.py::test_warm_reboot[str2-z9332f-05] PASSED [ 18%]
------------------------------ live log teardown -------------------------------
11:17:40 verify_dut_health._wrapper               L0018 ERROR  | Health check verify_no_coredumps failed with Core dumps found. Expected: 3 Found: 4

platform_tests/test_advanced_reboot.py::test_warm_reboot[str2-z9332f-05] ERROR [ 18%]
platform_tests/test_advanced_reboot.py::test_cancelled_fast_reboot[str2-z9332f-05] PASSED [ 27%]
platform_tests/test_advanced_reboot.py::test_cancelled_warm_reboot[str2-z9332f-05] PASSED [ 36%]
platform_tests/test_advanced_reboot.py::test_warm_reboot_sad[str2-z9332f-05] Loading callback plugin json of type stdout, v2.0 from /usr/local/lib/python2.7/dist-packages/ansible/plugins/callback/json.pyc

Here is the syslog where the crash occurred:

admin@str2-z9332f-05:/var/log$ sudo zcat syslog.327.gz
Oct  8 04:17:15.855125 str2-z9332f-05 INFO ansible-stat: Invoked with checksum_algorithm=sha1 get_checksum=True follow=False path=/tmp/loganalyzer.py get_md5=None get_mime=True get_attributes=True
Oct  8 04:17:16.214509 str2-z9332f-05 INFO ansible-file: Invoked with directory_mode=None force=False remote_src=None _original_basename=system_msg_handler.py path=/tmp/loganalyzer.py owner=None follow=True group=None unsafe_writes=None setype=None content=NOT_LOGGING_PARAMETER serole=None selevel=None state=file dest=/tmp/loganalyzer.py access_time=None access_time_format=%Y%m%d%H%M.%S modification_time=None regexp=None src=None seuser=None recurse=False _diff_peek=None delimiter=None mode=None modification_time_format=%Y%m%d%H%M.%S attributes=None backup=None
Oct  8 04:17:16.919223 str2-z9332f-05 INFO ansible-command: Invoked with creates=None executable=None _uses_shell=False strip_empty_ends=True _raw_params=python /tmp/loganalyzer.py --action init --run_id test_dhcp_relay_default.2021-10-08-04:17:16 removes=None argv=None warn=True chdir=None stdin_add_newline=True stdin=None
Oct  8 04:17:16.947809 str2-z9332f-05 INFO start-LogAnalyzer-test_dhcp_relay_default.2021-10-08-04:17:16
Oct  8 04:17:16.948034 str2-z9332f-05 INFO
Oct  8 04:17:17.868586 str2-z9332f-05 WARNING dhcp_relay#dhcpmon[27]: handle_dhcpv6_option(PortChannel0004): Unknown DHCPv6 option type 20
Oct  8 04:17:17.868586 str2-z9332f-05 ERR dhcp_relay#dhcp6relay: sendto: Failed to send to target address
Oct  8 04:17:17.870059 str2-z9332f-05 INFO kernel: [ 9514.097370] dhcpmon[263496]: segfault at 55a4ca832000 ip 000055a4c9dc88e6 sp 00007ffdb29ad410 error 4 in dhcpmon[55a4c9dc8000+3000]
Oct  8 04:17:17.870083 str2-z9332f-05 INFO kernel: [ 9514.097393] Code: 00 0f b6 4e 3e 8d 41 f4 3c 01 77 4a 31 c0 66 0f 1f 44 00 00 83 c0 22 48 63 c8 66 81 3c 0a 00 09 74 20 66 90 83 c0 02 48 63 c8 <0f> b7 0c 0a 66 c1 c1 08 0f b7 c9 01 c8 48 63 c8 66 81 3c 0a 00 09
Oct  8 04:17:18.371443 str2-z9332f-05 INFO dhcp_relay#supervisord 2021-10-08 04:17:18,370 INFO exited: dhcpmon-Vlan1000 (terminated by SIGSEGV (core dumped); not expected)

Steps to reproduce the issue:

Not sure if this is reproducible...

  1. Run nightly test on TH3 T0 DUT. More specifically it should be the platform_tests/test_advanced_reboot.py::test_warm_reboot

Output of show version:

admin@str2-z9332f-05:~$ show vers

SONiC Software Version: SONiC.20201231.34
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: 99b7fa5b5e
Build date: Wed Oct  6 23:10:34 UTC 2021
Built by: cloudtest@fe1d9d47c000001

Platform: x86_64-dellemc_z9332f_d1508-r0
HwSKU: DellEMC-Z9332f-M-O16C64
ASIC: broadcom
ASIC Count: 1
Serial Number: TH04CN21CET0004K0123
Uptime: 19:43:28 up 53 min,  2 users,  load average: 0.85, 0.82, 0.77

@zhangyanzhao zhangyanzhao added the Triaged this issue has been triaged label Oct 13, 2021
@gechiang
Copy link
Collaborator Author

I am also seeing this issue on 7050cx3 and 7260cx3 nightly test run while executing the following test cases:
test_dhcp_relay_default
test_dhcp_relay_after_link_flap

@shlomibitton
Copy link
Contributor

Fixed on PR: #8975

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

4 participants