Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel crash observed during kexec #6866

Closed
stepanblyschak opened this issue Feb 23, 2021 · 5 comments · Fixed by sonic-net/sonic-linux-kernel#199
Closed

Kernel crash observed during kexec #6866

stepanblyschak opened this issue Feb 23, 2021 · 5 comments · Fixed by sonic-net/sonic-linux-kernel#199
Labels
Issue for 202012 Triaged this issue has been triaged

Comments

@stepanblyschak
Copy link
Collaborator

Description

Kernel crashes because of device list corruption during kexec (warm-reboot, fast-reboot) with the following stack trace:

[  269.474092] watchdog: watchdog1: watchdog did not stop!
[  269.533625] list_del corruption. prev->next should be ffff9e136bd57418, but was 677ac660ffffffff
[  269.543482] kernel BUG at lib/list_debug.c:53!
[  269.548458] invalid opcode: 0000 [#1] SMP PTI
[  269.553326] CPU: 1 PID: 8890 Comm: kexec Tainted: G           OE     4.19.0-9-2-amd64 #1 Debian 4.19.118-2+deb10u1
[  269.564891] Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 11/03/2020
[  269.574323] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x4c
[  269.580740] Code: 9f 29 a5 e8 68 7a d0 ff 0f 0b 48 c7 c7 20 a0 29 a5 e8 5a 7a d0 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 e0 9f 29 a5 e8 46 7a d0 ff <0f> 0b 48 89 fe 48 c7 c7 a8 9f 29 a5 e8 35 7a d0 ff 0f 0b 90 90 90
[  269.601726] RSP: 0018:ffffaddb83b5fdc0 EFLAGS: 00010246
[  269.607561] RAX: 0000000000000054 RBX: ffff9e136bd57418 RCX: 0000000000000000
[  269.615531] RDX: 0000000000000000 RSI: ffff9e136fa566b8 RDI: ffff9e136fa566b8
[  269.623500] RBP: ffff9e1364bd5070 R08: 00000000000005ce R09: 0000000000000004
[  269.631470] R10: 0000000000000766 R11: ffffffffa59f66ad R12: ffff9e136bd57400
[  269.639440] R13: ffffffffa52c6a12 R14: ffff9e1364bd30d0 R15: 0000000000000000
[  269.647410] FS:  00007f97227af740(0000) GS:ffff9e136fa40000(0000) knlGS:0000000000000000
[  269.656441] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  269.662857] CR2: 000055cfdb69e158 CR3: 00000004677f6001 CR4: 00000000003606e0
[  269.670820] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  269.678790] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  269.686760] Call Trace:
[  269.689489]  device_shutdown+0xc1/0x210
[  269.693773]  kernel_kexec+0x51/0x96
[  269.697666]  __do_sys_reboot+0x1be/0x210
[  269.702045]  ? kmem_cache_free+0x1aa/0x1d0
[  269.706618]  ? __dentry_kill+0x121/0x170
[  269.710998]  ? _cond_resched+0x15/0x30
[  269.715181]  ? dentry_kill+0x4d/0x190
[  269.719260]  ? _cond_resched+0x15/0x30
[  269.723444]  do_syscall_64+0x53/0x110
[  269.727531]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  269.733172] RIP: 0033:0x7f97228a3373
[  269.737161] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 9a 0c 00 f7 d8
[  269.758147] RSP: 002b:00007ffe11d30fa8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[  269.766602] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f97228a3373
[  269.774572] RDX: 0000000045584543 RSI: 0000000028121969 RDI: 00000000fee1dead
[  269.782541] RBP: 0000000000000002 R08: 0000000000000004 R09: 000055cfdb69e160
[  269.790511] R10: fffffffffffffb8e R11: 0000000000000202 R12: 00007ffe11d31238
[  269.798482] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ffffffff
[  269.806443] Modules linked in: nft_chain_route_ipv4(E) xt_TCPMSS(E) sx_bfd(OE) sx_netdev(OE) psample(E) dummy(E) sx_core(OE) 8021q(E) garp(E) mrp(E) mst_pciconf(OE) mst_pci(OE) xt_hl(E) xt_tcpudp(E) ip6_tables(E) nft_compat(E) nft_counter(E) xt_conntrack(E) nf_nat(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) xfrm_user(E) xfrm_algo(E) intel_rapl(E) mlxsw_minimal(E) sb_edac(E) mlxsw_i2c(E) x86_pkg_temp_thermal(E) mlxsw_core(E) intel_powerclamp(E) devlink(E) kvm_intel(E) bonding(E) kvm(E) i2c_mux_reg(E) i2c_mux(E) mlxreg_hotplug(E) mlxreg_io(E) leds_mlxreg(E) i2c_mlxcpld(E) mlxreg_fan(E) mxm_wmi(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) evdev(E) mlx_platform(E) ghash_clmulni_intel(E) intel_cstate(E) sg(E) intel_uncore(E) iTCO_wdt(E) pcspkr(E)
[  269.885239]  intel_rapl_perf(E) ioatdma(E) iTCO_vendor_support(E) pcc_cpufreq(E) wmi(E) ebt_vlan(E) ebtable_broute(E) bridge(E) stp(E) llc(E) ebtable_nat(E) nf_tables(E) button(E) nfnetlink(E) ebtable_filter(E) ebtables(E) xdpe12284(E) at24(E) ledtrig_timer(E) tmp102(E) lm75(E) coretemp(E) max1363(E) industrialio_triggered_buffer(E) kfifo_buf(E) industrialio(E) tps53679(E) pmbus(E) pmbus_core(E) i2c_dev(E) ip_tables(E) x_tables(E) autofs4(E) loop(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) fscrypto(E) ecb(E) sd_mod(E) nvme(E) nvme_core(E) nls_utf8(E) nls_cp437(E) nls_ascii(E) vfat(E) fat(E) overlay(E) squashfs(E) zstd_decompress(E) xxhash(E) crc32c_intel(E) gpio_ich(E) ahci(E) aesni_intel(E) libahci(E) aes_x86_64(E) crypto_simd(E) xhci_pci(E) ehci_pci(E) libata(E) igb(E) ehci_hcd(E)
[  269.964036]  xhci_hcd(E) cryptd(E) glue_helper(E) scsi_mod(E) i2c_algo_bit(E) i2c_i801(E) lpc_ich(E) dca(E) mfd_core(E) usbcore(E) usb_common(E)
[  269.978536] ---[ end trace 8f56c678b52f9aee ]---
[  269.983698] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x4c
[  269.990123] Code: 9f 29 a5 e8 68 7a d0 ff 0f 0b 48 c7 c7 20 a0 29 a5 e8 5a 7a d0 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 e0 9f 29 a5 e8 46 7a d0 ff <0f> 0b 48 89 fe 48 c7 c7 a8 9f 29 a5 e8 35 7a d0 ff 0f 0b 90 90 90
[  270.011117] RSP: 0018:ffffaddb83b5fdc0 EFLAGS: 00010246
[  270.016958] RAX: 0000000000000054 RBX: ffff9e136bd57418 RCX: 0000000000000000
[  270.024935] RDX: 0000000000000000 RSI: ffff9e136fa566b8 RDI: ffff9e136fa566b8
[  270.032912] RBP: ffff9e1364bd5070 R08: 00000000000005ce R09: 0000000000000004
[  270.040890] R10: 0000000000000766 R11: ffffffffa59f66ad R12: ffff9e136bd57400
[  270.048866] R13: ffffffffa52c6a12 R14: ffff9e1364bd30d0 R15: 0000000000000000
[  270.056844] FS:  00007f97227af740(0000) GS:ffff9e136fa40000(0000) knlGS:0000000000000000
[  270.065889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  270.072312] CR2: 000055cfdb69e158 CR3: 00000004677f6001 CR4: 00000000003606e0
[  270.080289] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  270.088268] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Steps to reproduce the issue:

  1. sudo warm-reboot -v
    or
  2. sudo fast-reboot -v

Describe the results you received:

Kernel crashes with above stack trace, following a cold reboot.

Describe the results you expected:

Successfully warm/fast-reboot.

Output of show version:

admin@r-qa-sw-eth-112:~$ show platform summary
Platform: x86_64-mlnx_msn3700c-r0
HwSKU: ACS-MSN3700C
ASIC: mellanox
ASIC Count: 1
admin@r-qa-sw-eth-112:~$ show version

SONiC Software Version: SONiC.SONIC.202012.23-fa7d636_Internal
Distribution: Debian 10.8
Kernel: 4.19.0-12-2-amd64
Build commit: fa7d6362
Build date: Sun Feb 21 10:08:52 UTC 2021
Built by: sw-r2d2-bot@r-build-sonic-ci02

Platform: x86_64-mlnx_msn3700c-r0
HwSKU: ACS-MSN3700C
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1910X07118
Uptime: 16:45:59 up 0 min,  1 user,  load average: 2.97, 0.78, 0.27

Docker images:
REPOSITORY                    TAG                                IMAGE ID            SIZE
docker-teamd                  SONIC.202012.23-fa7d636_Internal   7760ab5f4867        410MB
docker-teamd                  latest                             7760ab5f4867        410MB
docker-nat                    SONIC.202012.23-fa7d636_Internal   0734519b4db2        413MB
docker-nat                    latest                             0734519b4db2        413MB
docker-orchagent              SONIC.202012.23-fa7d636_Internal   0e32e6c78fce        428MB
docker-orchagent              latest                             0e32e6c78fce        428MB
docker-fpm-frr                SONIC.202012.23-fa7d636_Internal   b3933c2dd313        428MB
docker-fpm-frr                latest                             b3933c2dd313        428MB
docker-sflow                  SONIC.202012.23-fa7d636_Internal   db0f7ce0b317        411MB
docker-sflow                  latest                             db0f7ce0b317        411MB
docker-syncd-mlnx             SONIC.202012.23-fa7d636_Internal   fb89f8ef4ba6        542MB
docker-syncd-mlnx             latest                             fb89f8ef4ba6        542MB
docker-snmp                   SONIC.202012.23-fa7d636_Internal   8d78574b5fb7        438MB
docker-snmp                   latest                             8d78574b5fb7        438MB
docker-sonic-mgmt-framework   SONIC.202012.23-fa7d636_Internal   a7a3146a4aff        615MB
docker-sonic-mgmt-framework   latest                             a7a3146a4aff        615MB
docker-router-advertiser      SONIC.202012.23-fa7d636_Internal   f811608edb05        397MB
docker-router-advertiser      latest                             f811608edb05        397MB
docker-platform-monitor       SONIC.202012.23-fa7d636_Internal   1f8965b709d9        689MB
docker-platform-monitor       latest                             1f8965b709d9        689MB
docker-lldp                   SONIC.202012.23-fa7d636_Internal   6528498f9d34        437MB
docker-lldp                   latest                             6528498f9d34        437MB
docker-database               SONIC.202012.23-fa7d636_Internal   8d5e7b90e3b6        397MB
docker-database               latest                             8d5e7b90e3b6        397MB
docker-dhcp-relay             SONIC.202012.23-fa7d636_Internal   7469a84d51da        404MB
docker-dhcp-relay             latest                             7469a84d51da        404MB
docker-sonic-telemetry        SONIC.202012.23-fa7d636_Internal   1c52c9fdba6e        472MB
docker-sonic-telemetry        latest                             1c52c9fdba6e        472MB

Additional information you deem important (e.g. issue happens only occasionally):

The issue happens randomly. Observed that different devices have different probability of this crash.
On some devices it has 50% probability of happening.

sonic_dump_r-qa-sw-eth-112_20210212_135822.tar.gz
log_from_console.txt

@lguohan
Copy link
Collaborator

lguohan commented Feb 23, 2021

is this related to this platform only?

@stepanblyschak
Copy link
Collaborator Author

I do not have way to test other platforms, but I do not see anything related to platform from the kernel trace

@lguohan
Copy link
Collaborator

lguohan commented Feb 26, 2021

[ 3856.225963]  device_shutdown+0xc1/0x210
[ 3856.230247]  kernel_kexec+0x51/0x96
[ 3856.234141]  __do_sys_reboot+0x1be/0x210
[ 3856.238521]  ? kmem_cache_free+0x1aa/0x1d0
[ 3856.243093]  ? __dentry_kill+0x121/0x170
[ 3856.247473]  ? _cond_resched+0x15/0x30
[ 3856.251658]  ? _cond_resched+0x15/0x30
[ 3856.255844]  do_syscall_64+0x53/0x110
[ 3856.259930]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

it looks like some ethernet driver shutdown and cause corruption? are you sending/receiving traffic during the warm reboot/fast reboot? if yes, can you stop sending/receiving traffic and then check?

@stepanblyschak
Copy link
Collaborator Author

@lguohan I'll shutdown ports and retest. Also noticed the ability to generate kdump in SONiC and will investigate it.

@anshuv-mfst anshuv-mfst added the Triaged this issue has been triaged label Mar 3, 2021
@stepanblyschak
Copy link
Collaborator Author

PR to fix this issue: sonic-net/sonic-linux-kernel#199

lguohan pushed a commit to sonic-net/sonic-linux-kernel that referenced this issue Mar 10, 2021
Fix sonic-net/sonic-buildimage#6866

Unset CONFIG_THERMAL_STATISTICS.
Reason:
Kernel thermal zones binding to the cooling device together with CONFIG_THERMAL_STATISTICS=y causes kernel crash as out of boundary:
trans_table is two-dimensional table allocated per max cooling state (10).
If statistics is configured, thermal_cooling_device_stats_update() will be called and will try to update out of boundary:
stats->trans_table[stats->state * stats->max_states + new_state]++

Kernel crash with the following stack trace:

```
[  269.474092] watchdog: watchdog1: watchdog did not stop!
[  269.533625] list_del corruption. prev->next should be ffff9e136bd57418, but was 677ac660ffffffff

[  269.543482] kernel BUG at lib/list_debug.c:53!
[  269.548458] invalid opcode: 0000 [#1] SMP PTI
[  269.553326] CPU: 1 PID: 8890 Comm: kexec Tainted: G           OE     4.19.0-9-2-amd64 #1 Debian 4.19.118-2+deb10u1
[  269.564891] Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 11/03/2020
[  269.574323] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x4c
[  269.580740] Code: 9f 29 a5 e8 68 7a d0 ff 0f 0b 48 c7 c7 20 a0 29 a5 e8 5a 7a d0 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 e0 9f 29 a5 e8 46 7a d0 ff <0f> 0b 48 89 fe 48 c7 c7 a8 9f 29 a5 e8 35 7a d0 ff 0f 0b 90 90 90
[  269.601726] RSP: 0018:ffffaddb83b5fdc0 EFLAGS: 00010246
[  269.607561] RAX: 0000000000000054 RBX: ffff9e136bd57418 RCX: 0000000000000000
[  269.615531] RDX: 0000000000000000 RSI: ffff9e136fa566b8 RDI: ffff9e136fa566b8
[  269.623500] RBP: ffff9e1364bd5070 R08: 00000000000005ce R09: 0000000000000004
[  269.631470] R10: 0000000000000766 R11: ffffffffa59f66ad R12: ffff9e136bd57400
[  269.639440] R13: ffffffffa52c6a12 R14: ffff9e1364bd30d0 R15: 0000000000000000
[  269.647410] FS:  00007f97227af740(0000) GS:ffff9e136fa40000(0000) knlGS:0000000000000000
[  269.656441] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  269.662857] CR2: 000055cfdb69e158 CR3: 00000004677f6001 CR4: 00000000003606e0
[  269.670820] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  269.678790] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  269.686760] Call Trace:
[  269.689489]  device_shutdown+0xc1/0x210
[  269.693773]  kernel_kexec+0x51/0x96
[  269.697666]  __do_sys_reboot+0x1be/0x210
[  269.702045]  ? kmem_cache_free+0x1aa/0x1d0
[  269.706618]  ? __dentry_kill+0x121/0x170
[  269.710998]  ? _cond_resched+0x15/0x30
[  269.715181]  ? dentry_kill+0x4d/0x190
[  269.719260]  ? _cond_resched+0x15/0x30
[  269.723444]  do_syscall_64+0x53/0x110
[  269.727531]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  269.733172] RIP: 0033:0x7f97228a3373
[  269.737161] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 9a 0c 00 f7 d8
[  269.758147] RSP: 002b:00007ffe11d30fa8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[  269.766602] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f97228a3373
[  269.774572] RDX: 0000000045584543 RSI: 0000000028121969 RDI: 00000000fee1dead
[  269.782541] RBP: 0000000000000002 R08: 0000000000000004 R09: 000055cfdb69e160
[  269.790511] R10: fffffffffffffb8e R11: 0000000000000202 R12: 00007ffe11d31238
[  269.798482] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ffffffff
[  269.806443] Modules linked in: nft_chain_route_ipv4(E) xt_TCPMSS(E) sx_bfd(OE) sx_netdev(OE) psample(E) dummy(E) sx_core(OE) 8021q(E) garp(E) mrp(E) mst_pciconf(OE) mst_pci(OE) xt_hl(E) xt_tcpudp(E) ip6_tables(E) nft_compat(E) nft_counter(E) xt_conntrack(E) nf_nat(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) xfrm_user(E) xfrm_algo(E) intel_rapl(E) mlxsw_minimal(E) sb_edac(E) mlxsw_i2c(E) x86_pkg_temp_thermal(E) mlxsw_core(E) intel_powerclamp(E) devlink(E) kvm_intel(E) bonding(E) kvm(E) i2c_mux_reg(E) i2c_mux(E) mlxreg_hotplug(E) mlxreg_io(E) leds_mlxreg(E) i2c_mlxcpld(E) mlxreg_fan(E) mxm_wmi(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) evdev(E) mlx_platform(E) ghash_clmulni_intel(E) intel_cstate(E) sg(E) intel_uncore(E) iTCO_wdt(E) pcspkr(E)
[  269.885239]  intel_rapl_perf(E) ioatdma(E) iTCO_vendor_support(E) pcc_cpufreq(E) wmi(E) ebt_vlan(E) ebtable_broute(E) bridge(E) stp(E) llc(E) ebtable_nat(E) nf_tables(E) button(E) nfnetlink(E) ebtable_filter(E) ebtables(E) xdpe12284(E) at24(E) ledtrig_timer(E) tmp102(E) lm75(E) coretemp(E) max1363(E) industrialio_triggered_buffer(E) kfifo_buf(E) industrialio(E) tps53679(E) pmbus(E) pmbus_core(E) i2c_dev(E) ip_tables(E) x_tables(E) autofs4(E) loop(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) fscrypto(E) ecb(E) sd_mod(E) nvme(E) nvme_core(E) nls_utf8(E) nls_cp437(E) nls_ascii(E) vfat(E) fat(E) overlay(E) squashfs(E) zstd_decompress(E) xxhash(E) crc32c_intel(E) gpio_ich(E) ahci(E) aesni_intel(E) libahci(E) aes_x86_64(E) crypto_simd(E) xhci_pci(E) ehci_pci(E) libata(E) igb(E) ehci_hcd(E)
[  269.964036]  xhci_hcd(E) cryptd(E) glue_helper(E) scsi_mod(E) i2c_algo_bit(E) i2c_i801(E) lpc_ich(E) dca(E) mfd_core(E) usbcore(E) usb_common(E)
[  269.978536] ---[ end trace 8f56c678b52f9aee ]---
[  269.983698] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x4c
[  269.990123] Code: 9f 29 a5 e8 68 7a d0 ff 0f 0b 48 c7 c7 20 a0 29 a5 e8 5a 7a d0 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 e0 9f 29 a5 e8 46 7a d0 ff <0f> 0b 48 89 fe 48 c7 c7 a8 9f 29 a5 e8 35 7a d0 ff 0f 0b 90 90 90
[  270.011117] RSP: 0018:ffffaddb83b5fdc0 EFLAGS: 00010246
[  270.016958] RAX: 0000000000000054 RBX: ffff9e136bd57418 RCX: 0000000000000000
[  270.024935] RDX: 0000000000000000 RSI: ffff9e136fa566b8 RDI: ffff9e136fa566b8
[  270.032912] RBP: ffff9e1364bd5070 R08: 00000000000005ce R09: 0000000000000004
[  270.040890] R10: 0000000000000766 R11: ffffffffa59f66ad R12: ffff9e136bd57400
[  270.048866] R13: ffffffffa52c6a12 R14: ffff9e1364bd30d0 R15: 0000000000000000
[  270.056844] FS:  00007f97227af740(0000) GS:ffff9e136fa40000(0000) knlGS:0000000000000000
[  270.065889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  270.072312] CR2: 000055cfdb69e158 CR3: 00000004677f6001 CR4: 00000000003606e0
[  270.080289] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  270.088268] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
```

A temporary solution is to disable this config and to work with the linux community on fixing it. 
The solution requires fan driver update which is not trivial and will take some time to have it available on next-net before can be backported to SONiC linux-kernel.

It was tested on:
HwSKU: ACS-MSN2410
HwSKU: Mellanox-SN2700
daall pushed a commit to sonic-net/sonic-linux-kernel that referenced this issue Mar 10, 2021
Fix sonic-net/sonic-buildimage#6866

Unset CONFIG_THERMAL_STATISTICS.
Reason:
Kernel thermal zones binding to the cooling device together with CONFIG_THERMAL_STATISTICS=y causes kernel crash as out of boundary:
trans_table is two-dimensional table allocated per max cooling state (10).
If statistics is configured, thermal_cooling_device_stats_update() will be called and will try to update out of boundary:
stats->trans_table[stats->state * stats->max_states + new_state]++

Kernel crash with the following stack trace:

```
[  269.474092] watchdog: watchdog1: watchdog did not stop!
[  269.533625] list_del corruption. prev->next should be ffff9e136bd57418, but was 677ac660ffffffff

[  269.543482] kernel BUG at lib/list_debug.c:53!
[  269.548458] invalid opcode: 0000 [#1] SMP PTI
[  269.553326] CPU: 1 PID: 8890 Comm: kexec Tainted: G           OE     4.19.0-9-2-amd64 #1 Debian 4.19.118-2+deb10u1
[  269.564891] Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 11/03/2020
[  269.574323] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x4c
[  269.580740] Code: 9f 29 a5 e8 68 7a d0 ff 0f 0b 48 c7 c7 20 a0 29 a5 e8 5a 7a d0 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 e0 9f 29 a5 e8 46 7a d0 ff <0f> 0b 48 89 fe 48 c7 c7 a8 9f 29 a5 e8 35 7a d0 ff 0f 0b 90 90 90
[  269.601726] RSP: 0018:ffffaddb83b5fdc0 EFLAGS: 00010246
[  269.607561] RAX: 0000000000000054 RBX: ffff9e136bd57418 RCX: 0000000000000000
[  269.615531] RDX: 0000000000000000 RSI: ffff9e136fa566b8 RDI: ffff9e136fa566b8
[  269.623500] RBP: ffff9e1364bd5070 R08: 00000000000005ce R09: 0000000000000004
[  269.631470] R10: 0000000000000766 R11: ffffffffa59f66ad R12: ffff9e136bd57400
[  269.639440] R13: ffffffffa52c6a12 R14: ffff9e1364bd30d0 R15: 0000000000000000
[  269.647410] FS:  00007f97227af740(0000) GS:ffff9e136fa40000(0000) knlGS:0000000000000000
[  269.656441] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  269.662857] CR2: 000055cfdb69e158 CR3: 00000004677f6001 CR4: 00000000003606e0
[  269.670820] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  269.678790] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  269.686760] Call Trace:
[  269.689489]  device_shutdown+0xc1/0x210
[  269.693773]  kernel_kexec+0x51/0x96
[  269.697666]  __do_sys_reboot+0x1be/0x210
[  269.702045]  ? kmem_cache_free+0x1aa/0x1d0
[  269.706618]  ? __dentry_kill+0x121/0x170
[  269.710998]  ? _cond_resched+0x15/0x30
[  269.715181]  ? dentry_kill+0x4d/0x190
[  269.719260]  ? _cond_resched+0x15/0x30
[  269.723444]  do_syscall_64+0x53/0x110
[  269.727531]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  269.733172] RIP: 0033:0x7f97228a3373
[  269.737161] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 9a 0c 00 f7 d8
[  269.758147] RSP: 002b:00007ffe11d30fa8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[  269.766602] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f97228a3373
[  269.774572] RDX: 0000000045584543 RSI: 0000000028121969 RDI: 00000000fee1dead
[  269.782541] RBP: 0000000000000002 R08: 0000000000000004 R09: 000055cfdb69e160
[  269.790511] R10: fffffffffffffb8e R11: 0000000000000202 R12: 00007ffe11d31238
[  269.798482] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ffffffff
[  269.806443] Modules linked in: nft_chain_route_ipv4(E) xt_TCPMSS(E) sx_bfd(OE) sx_netdev(OE) psample(E) dummy(E) sx_core(OE) 8021q(E) garp(E) mrp(E) mst_pciconf(OE) mst_pci(OE) xt_hl(E) xt_tcpudp(E) ip6_tables(E) nft_compat(E) nft_counter(E) xt_conntrack(E) nf_nat(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) xfrm_user(E) xfrm_algo(E) intel_rapl(E) mlxsw_minimal(E) sb_edac(E) mlxsw_i2c(E) x86_pkg_temp_thermal(E) mlxsw_core(E) intel_powerclamp(E) devlink(E) kvm_intel(E) bonding(E) kvm(E) i2c_mux_reg(E) i2c_mux(E) mlxreg_hotplug(E) mlxreg_io(E) leds_mlxreg(E) i2c_mlxcpld(E) mlxreg_fan(E) mxm_wmi(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) evdev(E) mlx_platform(E) ghash_clmulni_intel(E) intel_cstate(E) sg(E) intel_uncore(E) iTCO_wdt(E) pcspkr(E)
[  269.885239]  intel_rapl_perf(E) ioatdma(E) iTCO_vendor_support(E) pcc_cpufreq(E) wmi(E) ebt_vlan(E) ebtable_broute(E) bridge(E) stp(E) llc(E) ebtable_nat(E) nf_tables(E) button(E) nfnetlink(E) ebtable_filter(E) ebtables(E) xdpe12284(E) at24(E) ledtrig_timer(E) tmp102(E) lm75(E) coretemp(E) max1363(E) industrialio_triggered_buffer(E) kfifo_buf(E) industrialio(E) tps53679(E) pmbus(E) pmbus_core(E) i2c_dev(E) ip_tables(E) x_tables(E) autofs4(E) loop(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) fscrypto(E) ecb(E) sd_mod(E) nvme(E) nvme_core(E) nls_utf8(E) nls_cp437(E) nls_ascii(E) vfat(E) fat(E) overlay(E) squashfs(E) zstd_decompress(E) xxhash(E) crc32c_intel(E) gpio_ich(E) ahci(E) aesni_intel(E) libahci(E) aes_x86_64(E) crypto_simd(E) xhci_pci(E) ehci_pci(E) libata(E) igb(E) ehci_hcd(E)
[  269.964036]  xhci_hcd(E) cryptd(E) glue_helper(E) scsi_mod(E) i2c_algo_bit(E) i2c_i801(E) lpc_ich(E) dca(E) mfd_core(E) usbcore(E) usb_common(E)
[  269.978536] ---[ end trace 8f56c678b52f9aee ]---
[  269.983698] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x4c
[  269.990123] Code: 9f 29 a5 e8 68 7a d0 ff 0f 0b 48 c7 c7 20 a0 29 a5 e8 5a 7a d0 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 e0 9f 29 a5 e8 46 7a d0 ff <0f> 0b 48 89 fe 48 c7 c7 a8 9f 29 a5 e8 35 7a d0 ff 0f 0b 90 90 90
[  270.011117] RSP: 0018:ffffaddb83b5fdc0 EFLAGS: 00010246
[  270.016958] RAX: 0000000000000054 RBX: ffff9e136bd57418 RCX: 0000000000000000
[  270.024935] RDX: 0000000000000000 RSI: ffff9e136fa566b8 RDI: ffff9e136fa566b8
[  270.032912] RBP: ffff9e1364bd5070 R08: 00000000000005ce R09: 0000000000000004
[  270.040890] R10: 0000000000000766 R11: ffffffffa59f66ad R12: ffff9e136bd57400
[  270.048866] R13: ffffffffa52c6a12 R14: ffff9e1364bd30d0 R15: 0000000000000000
[  270.056844] FS:  00007f97227af740(0000) GS:ffff9e136fa40000(0000) knlGS:0000000000000000
[  270.065889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  270.072312] CR2: 000055cfdb69e158 CR3: 00000004677f6001 CR4: 00000000003606e0
[  270.080289] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  270.088268] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
```

A temporary solution is to disable this config and to work with the linux community on fixing it. 
The solution requires fan driver update which is not trivial and will take some time to have it available on next-net before can be backported to SONiC linux-kernel.

It was tested on:
HwSKU: ACS-MSN2410
HwSKU: Mellanox-SN2700
Junchao-Mellanox pushed a commit to Junchao-Mellanox/sonic-linux-kernel that referenced this issue Mar 16, 2021
Fix sonic-net/sonic-buildimage#6866

Unset CONFIG_THERMAL_STATISTICS.
Reason:
Kernel thermal zones binding to the cooling device together with CONFIG_THERMAL_STATISTICS=y causes kernel crash as out of boundary:
trans_table is two-dimensional table allocated per max cooling state (10).
If statistics is configured, thermal_cooling_device_stats_update() will be called and will try to update out of boundary:
stats->trans_table[stats->state * stats->max_states + new_state]++

Kernel crash with the following stack trace:

```
[  269.474092] watchdog: watchdog1: watchdog did not stop!
[  269.533625] list_del corruption. prev->next should be ffff9e136bd57418, but was 677ac660ffffffff

[  269.543482] kernel BUG at lib/list_debug.c:53!
[  269.548458] invalid opcode: 0000 [#1] SMP PTI
[  269.553326] CPU: 1 PID: 8890 Comm: kexec Tainted: G           OE     4.19.0-9-2-amd64 #1 Debian 4.19.118-2+deb10u1
[  269.564891] Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 11/03/2020
[  269.574323] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x4c
[  269.580740] Code: 9f 29 a5 e8 68 7a d0 ff 0f 0b 48 c7 c7 20 a0 29 a5 e8 5a 7a d0 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 e0 9f 29 a5 e8 46 7a d0 ff <0f> 0b 48 89 fe 48 c7 c7 a8 9f 29 a5 e8 35 7a d0 ff 0f 0b 90 90 90
[  269.601726] RSP: 0018:ffffaddb83b5fdc0 EFLAGS: 00010246
[  269.607561] RAX: 0000000000000054 RBX: ffff9e136bd57418 RCX: 0000000000000000
[  269.615531] RDX: 0000000000000000 RSI: ffff9e136fa566b8 RDI: ffff9e136fa566b8
[  269.623500] RBP: ffff9e1364bd5070 R08: 00000000000005ce R09: 0000000000000004
[  269.631470] R10: 0000000000000766 R11: ffffffffa59f66ad R12: ffff9e136bd57400
[  269.639440] R13: ffffffffa52c6a12 R14: ffff9e1364bd30d0 R15: 0000000000000000
[  269.647410] FS:  00007f97227af740(0000) GS:ffff9e136fa40000(0000) knlGS:0000000000000000
[  269.656441] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  269.662857] CR2: 000055cfdb69e158 CR3: 00000004677f6001 CR4: 00000000003606e0
[  269.670820] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  269.678790] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  269.686760] Call Trace:
[  269.689489]  device_shutdown+0xc1/0x210
[  269.693773]  kernel_kexec+0x51/0x96
[  269.697666]  __do_sys_reboot+0x1be/0x210
[  269.702045]  ? kmem_cache_free+0x1aa/0x1d0
[  269.706618]  ? __dentry_kill+0x121/0x170
[  269.710998]  ? _cond_resched+0x15/0x30
[  269.715181]  ? dentry_kill+0x4d/0x190
[  269.719260]  ? _cond_resched+0x15/0x30
[  269.723444]  do_syscall_64+0x53/0x110
[  269.727531]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  269.733172] RIP: 0033:0x7f97228a3373
[  269.737161] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 9a 0c 00 f7 d8
[  269.758147] RSP: 002b:00007ffe11d30fa8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[  269.766602] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f97228a3373
[  269.774572] RDX: 0000000045584543 RSI: 0000000028121969 RDI: 00000000fee1dead
[  269.782541] RBP: 0000000000000002 R08: 0000000000000004 R09: 000055cfdb69e160
[  269.790511] R10: fffffffffffffb8e R11: 0000000000000202 R12: 00007ffe11d31238
[  269.798482] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ffffffff
[  269.806443] Modules linked in: nft_chain_route_ipv4(E) xt_TCPMSS(E) sx_bfd(OE) sx_netdev(OE) psample(E) dummy(E) sx_core(OE) 8021q(E) garp(E) mrp(E) mst_pciconf(OE) mst_pci(OE) xt_hl(E) xt_tcpudp(E) ip6_tables(E) nft_compat(E) nft_counter(E) xt_conntrack(E) nf_nat(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) xfrm_user(E) xfrm_algo(E) intel_rapl(E) mlxsw_minimal(E) sb_edac(E) mlxsw_i2c(E) x86_pkg_temp_thermal(E) mlxsw_core(E) intel_powerclamp(E) devlink(E) kvm_intel(E) bonding(E) kvm(E) i2c_mux_reg(E) i2c_mux(E) mlxreg_hotplug(E) mlxreg_io(E) leds_mlxreg(E) i2c_mlxcpld(E) mlxreg_fan(E) mxm_wmi(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) evdev(E) mlx_platform(E) ghash_clmulni_intel(E) intel_cstate(E) sg(E) intel_uncore(E) iTCO_wdt(E) pcspkr(E)
[  269.885239]  intel_rapl_perf(E) ioatdma(E) iTCO_vendor_support(E) pcc_cpufreq(E) wmi(E) ebt_vlan(E) ebtable_broute(E) bridge(E) stp(E) llc(E) ebtable_nat(E) nf_tables(E) button(E) nfnetlink(E) ebtable_filter(E) ebtables(E) xdpe12284(E) at24(E) ledtrig_timer(E) tmp102(E) lm75(E) coretemp(E) max1363(E) industrialio_triggered_buffer(E) kfifo_buf(E) industrialio(E) tps53679(E) pmbus(E) pmbus_core(E) i2c_dev(E) ip_tables(E) x_tables(E) autofs4(E) loop(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) fscrypto(E) ecb(E) sd_mod(E) nvme(E) nvme_core(E) nls_utf8(E) nls_cp437(E) nls_ascii(E) vfat(E) fat(E) overlay(E) squashfs(E) zstd_decompress(E) xxhash(E) crc32c_intel(E) gpio_ich(E) ahci(E) aesni_intel(E) libahci(E) aes_x86_64(E) crypto_simd(E) xhci_pci(E) ehci_pci(E) libata(E) igb(E) ehci_hcd(E)
[  269.964036]  xhci_hcd(E) cryptd(E) glue_helper(E) scsi_mod(E) i2c_algo_bit(E) i2c_i801(E) lpc_ich(E) dca(E) mfd_core(E) usbcore(E) usb_common(E)
[  269.978536] ---[ end trace 8f56c678b52f9aee ]---
[  269.983698] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x4c
[  269.990123] Code: 9f 29 a5 e8 68 7a d0 ff 0f 0b 48 c7 c7 20 a0 29 a5 e8 5a 7a d0 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 e0 9f 29 a5 e8 46 7a d0 ff <0f> 0b 48 89 fe 48 c7 c7 a8 9f 29 a5 e8 35 7a d0 ff 0f 0b 90 90 90
[  270.011117] RSP: 0018:ffffaddb83b5fdc0 EFLAGS: 00010246
[  270.016958] RAX: 0000000000000054 RBX: ffff9e136bd57418 RCX: 0000000000000000
[  270.024935] RDX: 0000000000000000 RSI: ffff9e136fa566b8 RDI: ffff9e136fa566b8
[  270.032912] RBP: ffff9e1364bd5070 R08: 00000000000005ce R09: 0000000000000004
[  270.040890] R10: 0000000000000766 R11: ffffffffa59f66ad R12: ffff9e136bd57400
[  270.048866] R13: ffffffffa52c6a12 R14: ffff9e1364bd30d0 R15: 0000000000000000
[  270.056844] FS:  00007f97227af740(0000) GS:ffff9e136fa40000(0000) knlGS:0000000000000000
[  270.065889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  270.072312] CR2: 000055cfdb69e158 CR3: 00000004677f6001 CR4: 00000000003606e0
[  270.080289] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  270.088268] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
```

A temporary solution is to disable this config and to work with the linux community on fixing it.
The solution requires fan driver update which is not trivial and will take some time to have it available on next-net before can be backported to SONiC linux-kernel.

It was tested on:
HwSKU: ACS-MSN2410
HwSKU: Mellanox-SN2700
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue for 202012 Triaged this issue has been triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants