Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Divide error on device removal #508

Open
jeid64 opened this issue Jul 14, 2024 · 2 comments
Open

Divide error on device removal #508

jeid64 opened this issue Jul 14, 2024 · 2 comments
Labels
bug feedback-needed Waiting for a reply

Comments

@jeid64
Copy link

jeid64 commented Jul 14, 2024

Howdy, I'm using the latest commit off of export to run the bpf_red scheduler. I'm using NetworkManager to configure mptcp endpoint with flag 160 (sunflower, fullmesh). When I unplug devices to simulate network dropouts, I get a divide error in dmesg, at which point no netlink sockets work anymore for mptcp endpoint, and NetworkManager hangs. This seems to reoccur 100% of the time. Is anyone else using NetworkManager to manage their devices or do you recommend using mptpcpd?

[  434.820308] usb 3-1.3: USB disconnect, device number 20
[  434.820482] rndis_host 3-1.3:1.0 enp0s20f0u1u3: unregister 'rndis_host' usb-0000:00:14.0-1.3, ZTE RNDIS device
[  434.846554] Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI
[  434.846567] CPU: 1 PID: 4968 Comm: NetworkManager Not tainted 6.10.0-200.fc40.x86_64 #1
[  434.846570] Hardware name: GPD G1618-03/G1618-03, BIOS 2.22 04/29/2021
[  434.846573] RIP: 0010:tcp_tso_segs+0x84/0xd0
[  434.846581] Code: 05 00 00 41 89 c6 41 d3 ee 80 f9 1f 0f 87 cb 2a 27 00 41 83 fe 1f 76 37 8b 83 2c 02 00 00 4c 39 e0 44 89 e9 49 0f 47 c4 31 d2 <48> f7 f1 39 c5 0f 42 e8 0f b7 83 2a 02 00 00 39 c5 0f 46 c5 48 83
[  434.846584] RSP: 0000:ffffaf0dc19a7520 EFLAGS: 00010246
[  434.846588] RAX: 0000000000000179 RBX: ffff99d587c8a500 RCX: 0000000000000000
[  434.846591] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff99d587c8a500
[  434.846593] RBP: 0000000000000002 R08: 0000000000000820 R09: 0000000000000000
[  434.846595] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000179
[  434.846596] R13: 0000000000000000 R14: 0000000000000042 R15: ffff99d587c8a500
[  434.846599] FS:  00007faf91208580(0000) GS:ffff99d5df880000(0000) knlGS:0000000000000000
[  434.846602] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  434.846604] CR2: 0000000043942000 CR3: 0000000147e58001 CR4: 0000000000f70ef0
[  434.846607] PKRU: 55555554
[  434.846609] Call Trace:
[  434.846613]  <TASK>
[  434.846619]  ? __die_body.cold+0x19/0x27
[  434.846629]  ? die+0x2e/0x50
[  434.846635]  ? do_trap+0xca/0x110
[  434.846648]  ? do_error_trap+0x6a/0x90
[  434.846651]  ? tcp_tso_segs+0x84/0xd0
[  434.846655]  ? exc_divide_error+0x38/0x50
[  434.846661]  ? tcp_tso_segs+0x84/0xd0
[  434.846664]  ? asm_exc_divide_error+0x1a/0x20
[  434.846672]  ? tcp_tso_segs+0x84/0xd0
[  434.846676]  tcp_write_xmit+0x78/0x16c0
[  434.846680]  __tcp_push_pending_frames+0x36/0xf0
[  434.846684]  __mptcp_push_pending+0xef/0x2a0
[  434.846693]  __mptcp_close_ssk+0x20f/0x560
[  434.846697]  mptcp_pm_nl_rm_addr_or_subflow+0x150/0x330
[  434.846706]  mptcp_pm_remove_subflow+0x2f/0x60
[  434.846710]  mptcp_pm_nl_del_addr_doit+0x1b7/0x370
[  434.846716]  genl_family_rcv_msg_doit+0xef/0x150
[  434.846725]  genl_rcv_msg+0x1b7/0x2c0
[  434.846730]  ? __pfx_mptcp_pm_nl_del_addr_doit+0x10/0x10
[  434.846733]  ? __pfx_genl_rcv_msg+0x10/0x10
[  434.846737]  netlink_rcv_skb+0x50/0x100
[  434.846743]  genl_rcv+0x28/0x40
[  434.846747]  netlink_unicast+0x242/0x370
[  434.846751]  netlink_sendmsg+0x21b/0x470
[  434.846755]  ____sys_sendmsg+0x396/0x3d0
[  434.846763]  ___sys_sendmsg+0x9a/0xe0
[  434.846769]  __sys_sendmsg+0xcc/0x100
[  434.846775]  do_syscall_64+0x82/0x160
[  434.846779]  ? syscall_exit_to_user_mode+0x72/0x220
[  434.846784]  ? do_syscall_64+0x8e/0x160
[  434.846789]  ? __sys_sendmsg+0xdc/0x100
[  434.846794]  ? syscall_exit_to_user_mode+0x72/0x220
[  434.846796]  ? do_syscall_64+0x8e/0x160
[  434.846798]  ? __rseq_handle_notify_resume+0xa6/0x4d0
[  434.846804]  ? clockevents_program_event+0x9f/0x110
[  434.846811]  ? switch_fpu_return+0x4e/0xd0
[  434.846818]  ? clear_bhb_loop+0x45/0xa0
[  434.846823]  ? clear_bhb_loop+0x45/0xa0
[  434.846826]  ? clear_bhb_loop+0x45/0xa0
[  434.846830]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  434.846834] RIP: 0033:0x7faf9214d84b
[  434.846889] Code: 48 89 e5 48 83 ec 20 89 55 ec 48 89 75 f0 89 7d f8 e8 69 5c f7 ff 8b 55 ec 48 8b 75 f0 41 89 c0 8b 7d f8 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2d 44 89 c7 48 89 45 f8 e8 c1 5c f7 ff 48 8b
[  434.846892] RSP: 002b:00007fff077ae3d0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[  434.846895] RAX: ffffffffffffffda RBX: 00005557d7820520 RCX: 00007faf9214d84b
[  434.846898] RDX: 0000000000000000 RSI: 00007fff077ae410 RDI: 000000000000000b
[  434.846900] RBP: 00007fff077ae3f0 R08: 0000000000000000 R09: 000000000000000d
[  434.846902] R10: 00005557d76fb010 R11: 0000000000000293 R12: 00005557d780fd74
[  434.846904] R13: 00005557d77c65a8 R14: 00005557d774d060 R15: 0000000000000002
[  434.846908]  </TASK>
[  434.846909] Modules linked in: cdc_acm xt_REDIRECT nft_compat overlay uinput hid_apple rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun ip_set nf_tables qrtr uhid bnep sunrpc snd_soc_skl_hda_dsp snd_soc_hdac_hdmi snd_sof_probes snd_soc_intel_hda_dsp_common snd_soc_rt700 regmap_sdw snd_hda_codec_hdmi snd_soc_dmic snd_hda_codec_realtek binfmt_misc snd_hda_codec_generic snd_hda_scodec_component snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence vfat fat snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_soc_acpi_intel_match soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_soc_core snd_compress ac97_bus
[  434.847002]  snd_pcm_dmaengine snd_hda_intel intel_uncore_frequency snd_intel_dspcfg snd_intel_sdw_acpi intel_uncore_frequency_common snd_hda_codec x86_pkg_temp_thermal intel_powerclamp snd_hda_core coretemp snd_hwdep spi_nor mei_pxp mei_hdcp kvm_intel joydev mtd gpio_keys snd_seq intel_rapl_msr iwlmvm snd_seq_device snd_pcm btusb kvm snd_timer btrtl mac80211 btintel snd libarc4 rapl btbcm intel_cstate mei_me processor_thermal_device_pci_legacy rndis_host btmtk cdc_ether intel_uncore mei processor_thermal_device bluetooth xpad usbnet iwlwifi processor_thermal_wt_hint soundcore mii i2c_i801 spi_intel_pci spi_intel processor_thermal_rfim wmi_bmof pcspkr i2c_smbus processor_thermal_rapl intel_rapl_common thunderbolt idma64 processor_thermal_wt_req processor_thermal_power_floor processor_thermal_mbox intel_soc_dts_iosf igen6_edac goodix_ts intel_pmc_core soc_button_array int3403_thermal int340x_thermal_zone intel_vsec int3400_thermal intel_hid pmt_telemetry acpi_thermal_rel sparse_keymap pmt_class acpi_tad acpi_pad
[  434.847106]  brcmfmac brcmutil cfg80211 mmc_core rfkill amdgpu tcp_bbr sch_fq ledtrig_timer hid_playstation amdxcp led_class_multicolor ff_memless loop nfnetlink zstd zram xe drm_gpuvm drm_exec gpu_sched drm_suballoc_helper drm_ttm_helper uas usb_storage i915 cec drm_buddy crct10dif_pclmul i2c_algo_bit crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme ghash_clmulni_intel drm_display_helper nvme_core sha512_ssse3 sha256_ssse3 spi_pxa2xx_platform ttm sha1_ssse3 dw_dmac nvme_auth video wmi pinctrl_tigerlake scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse i2c_dev dm_multipath
[  434.847195] ---[ end trace 0000000000000000 ]---
[  434.847198] RIP: 0010:tcp_tso_segs+0x84/0xd0
[  434.847204] Code: 05 00 00 41 89 c6 41 d3 ee 80 f9 1f 0f 87 cb 2a 27 00 41 83 fe 1f 76 37 8b 83 2c 02 00 00 4c 39 e0 44 89 e9 49 0f 47 c4 31 d2 <48> f7 f1 39 c5 0f 42 e8 0f b7 83 2a 02 00 00 39 c5 0f 46 c5 48 83
[  434.847206] RSP: 0000:ffffaf0dc19a7520 EFLAGS: 00010246
[  434.847209] RAX: 0000000000000179 RBX: ffff99d587c8a500 RCX: 0000000000000000
[  434.847212] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff99d587c8a500
[  434.847214] RBP: 0000000000000002 R08: 0000000000000820 R09: 0000000000000000
[  434.847215] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000179
[  434.847217] R13: 0000000000000000 R14: 0000000000000042 R15: ffff99d587c8a500
[  434.847219] FS:  00007faf91208580(0000) GS:ffff99d5df880000(0000) knlGS:0000000000000000
[  434.847222] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  434.847224] CR2: 0000000043942000 CR3: 0000000147e58001 CR4: 0000000000f70ef0
[  434.847227] PKRU: 55555554
@pabeni
Copy link

pabeni commented Jul 15, 2024

mptcp endpoint with flag 160 (sunflower, fullmesh).

Could you please report the output of:

ip mptcp endpoint

and

ss -MaeimnO

just before unplugging the cable?

Also, could you please provide a decoded stack trace? You will have to install the kernel debuginfo packages

@matttbe matttbe added the bug label Jul 15, 2024
@matttbe
Copy link
Member

matttbe commented Jul 15, 2024

@jeid64 Thank you for the bug report!

@pabeni Thank you for having looked!

Also, could you please provide a decoded stack trace? You will have to install the kernel debuginfo packages

Just in case this is needed, you can find more info about that in our wiki

@matttbe matttbe added the feedback-needed Waiting for a reply label Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug feedback-needed Waiting for a reply
Projects
None yet
Development

No branches or pull requests

3 participants