Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"eth0: kevent 2 may have been dropped" is still here #309

Closed
wrobelda opened this issue Jun 4, 2013 · 32 comments
Closed

"eth0: kevent 2 may have been dropped" is still here #309

wrobelda opened this issue Jun 4, 2013 · 32 comments

Comments

@wrobelda
Copy link

wrobelda commented Jun 4, 2013

As per above, I am unfortunately suffering from this error for a long time now.
No fix can make it vanish:

  • disabling Turbo Mode does not help at all
  • increasing the vm.min_free_kilobytes only postpones the error in time but it does show up eventually after triggering some CPU and network intensive processes - like torrent for that matter - and giving it some minutes to hours tops. Please note that I have a 100 Mbps Internet over Ethernet available for the PI.
  • as soon as the kevent error shows up, the wifi card that is connected to Pi and set in AP mode also starts to slow down and fail with "ieee80211 phy0: failed to reallocate TX buffer" every ~10 times a minute.
  • if the IO and CPU load continues, at some point the device will hang. And when it hangs it hangs - I cannot even use the serial console to kdb and see what happened.
  • both errors stop appearing if USB-load goes down, e.g. I stop using wifi (I am connected but not browsing) or close torrent application. But lets say I start copying some files to USB pendrive connected to my hub and both errors are back.

I have postponed with reporting this until I had possibility to test another Raspberry board so to make sure this is not a hardware that was malfunctioning. I also tested with different power source to no avil.

Also tried two days ago with recent fiq_split branch - same behaviour + extra mmc0 timeouts, but that's a different story.

Tested on:
me@rpi ~ $ uname -a
Linux rpi 3.6.11+ #456 PREEMPT Mon May 20 17:42:15 BST 2013 armv6l GNU/Linux
me@rpi ~ $ /opt/vc/bin/vcgencmd version
May 26 2013 21:47:02
Copyright (c) 2012 Broadcom
version 53261d4ede3ba2b660e4201aca9bd4544565a3ce (clean) (release)

...and also on:
root@rpi:/home/# uname -a
Linux rpi 3.6.11+ #462 PREEMPT Mon Jun 3 22:15:00 BST 2013 armv6l GNU/Linux
root@rpi:/home/# /opt/vc/bin/vcgencmd version
Jun 3 2013 22:37:15
Copyright (c) 2012 Broadcom
version 2bac1bb890aa545e180e3f0766fb67989b590d26 (clean) (release)

I am going to test it with some other USB Hub. But I would be surprised if the error were to be hub's fault, as the one I own is the one that's reported to be probably most compatible - Plugable 7 Port with external power amp.

@wrobelda
Copy link
Author

wrobelda commented Jun 5, 2013

Something interesting has shown up when I woke up this morning and checked dmesg:

[Wed Jun 5 03:08:08 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:08:08 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:08:09 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:09:10 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:10:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:10:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:10:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:10:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:10:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:10:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:10:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:10:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:10:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:10:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:10:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:10:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:10:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:10:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:10:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:10:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:10:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:10:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:10:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:10:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:10:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:10:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:11:27 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:11:28 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:12:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:12:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:12:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:12:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:12:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:12:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:12:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:12:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:12:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:12:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:12:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:12:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:12:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:12:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:12:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:12:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:12:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:12:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:12:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:12:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:12:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:12:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:14:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:14:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:14:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:14:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:14:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:14:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:14:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:14:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:14:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:14:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:14:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:14:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:14:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:14:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:14:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:14:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:14:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:14:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:14:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:14:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:14:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:14:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:16:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:16:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:16:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:16:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:16:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:16:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:16:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:16:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:16:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:16:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:16:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:16:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:16:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:16:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:16:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:16:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:16:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:16:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:16:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:16:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:16:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:16:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:18:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:18:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:18:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:18:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:18:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:18:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:18:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:18:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:18:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:18:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:18:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:18:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:18:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:18:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:18:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:18:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:18:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:18:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:18:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:18:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:18:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:18:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 06:25:52 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 06:25:55 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 06:26:06 2013] ieee80211 phy0: failed to reallocate TX buffer

@popcornmix
Copy link
Collaborator

Can you run the same test with wired ethernet? or a different wifi dongle?
Looks a bit like a memory leak (or an inappropraitely sized buffer) in the network stack that eventually brings the system down.

If you limit the download speed of the torrent application, does it run for longer? Does it run forever?

@wrobelda
Copy link
Author

wrobelda commented Jun 5, 2013

The wired ethernet is a WAN gateway and is connected at all times and so is wifi, which works in AP mode driven by hostapd. I am going to try with another wifi dongle, but will have to find one that supports AP well which will take some days.

The torrent application actually almost never reaches more than 10Mbps, but for the sake of tests I will limit it to e.g. 5Mbps and see.

I can't back it up with any evidence, but being myself a programmer it seems to me by intuition that this is indeed some memory leak.

@popcornmix
Copy link
Collaborator

So the Pi is acting as an wireless access point for other machines.
And the torrent app is run on another machine - not the pi?

@wrobelda
Copy link
Author

wrobelda commented Jun 5, 2013

Correct, Pi is wireless access point BUT the torrent app runs ON the Pi at the same time.

@wrobelda
Copy link
Author

I did lower the torrent throughput to roughly 8 mbps and it still hangs after at most one day. The only time I can enjoy long uptimes is when I disable torrent at all.

This error has been here pretty much ever since the beginning of The Life of PI. Is there anything you can do about it or is this non-fixable and so I should probably go and return my PIs claiming they are not as advertised?

I am sorry to say that but this is what it looks like to me. The USB 2.0 subsystem is not usable.

@popcornmix
Copy link
Collaborator

It seems to be a memory allocation error in network stack.
Have you got 256M or 512M Pi? Running the cutdown firmware (gpu_mem=16)?

I would have thought increasing vm.min_free_kilobytes is the most likely solution.

What torrent client? It may be worth trying to find a lighter one.
Is it better if torrent client is run on another machine on network (but still using Pi as AP).

@licaon-kter
Copy link

wait, this is a bug now?
I get this all the time, and this Pi only has minimal network activity ( ssh connection for just one user, and sometimes some git updates )
I already have:

smc95xx.turbo_mode=N in /boot/cmdline.txt
vm.min_free_kbytes = 8192 in /etc/sysctl.d/local.conf

@wrobelda
Copy link
Author

@licaon-kter had you abused the eth it a bit more, it would most likely hang after a period of time

@popcornmix
Have 512.
Not sure if that's what you mean by the cutdown firmware, but my gpu_mem is set to 128MB, although nothing is really making use of it.
Have tried with min_free_kilobytes up to 128MB (sic!), yet still would hang after pretty much same period of time (hard to tell, always around a day)
Tried with both transmission and rtorrent.
I will see if it changes if I download on a machine inside the network.

@Ferroin
Copy link
Contributor

Ferroin commented Jun 19, 2013

I think that adjusting vm.min_free_kbytes is just putting an (ugly) bandage on the problem. If the cause is indeed a memory leak, (which sounds like the most probable cause to me), the system will evenutually reach an OOM condition (this may take a long time if you have a lot of swap), raising vm.min_free_kbytes will just make the system thrash more as it approaches vm exhaustion.

@nickriordan
Copy link

I may be seeing this too. Have a pi running afpd and am copying around several hundred GB of files over Ethernet from my Mac to a locally attached USB WD hard drive. Everything runs smoothly for the first 50-70GB and then the Pi crashes. Only messages that I can see in any of the logs is "eth0: kevent 2 may have been dropped". Also I have disabled swap (silly idea - very slow on SD card and wears the card out quickly). 512MB Pi. Video split set to 32MB leaving around 480MB free for the Pi. Running as a server.

@wrobelda
Copy link
Author

I did some more experiments. I powered the Pi with external 1.5 amps USB power source instead of from the HUB and also connected the WiFi card to Pi directly. The second USB port is still used by USB hub that in turns runs HDD only.

It seems more stable now with the uptime of 12 hours while constantly downloading some linux ISO images off the torrent. After having set the min_free_kbytes to 32M, there are also no errors in dmesg.
I will do some more testing with this configuration and then see if it breaks after I switch the WIFi to HUB. Its either crappy WiFi or the HUB's unreliable power source that cause these hangups. However given how good opinion the HUB I own has, I would expect that it's the WiFi that does not like the HUB.

@wrobelda
Copy link
Author

I am afraid this bug is still here. Raspberry still hangs after as little as an hour.

@ZeBadger
Copy link

I have this problem with 2013-07-26-wheezy-raspbian on a 256mb pi model b

If I run the command "git clone http://github.com/raspberrypi/linux.git kernel" it works until about 14% then it just exits with no error messages (exit code 128)

I get "kevent 2 may have been dropped" in dmesg

Changing the vm.min_free_kbytes to 32768 and disabling turbo mode helps a little, but I've still not managed to complete that command.. although last time I was running X/Windows and when I tried to to "ls -la" I got an out of memory error.

@licaon-kter
Copy link

try to update first, maybe to the next branch via sudo BRANCH=next rpi-update as I haven't seen this for a while
also, github has failed for me in the past as well Hexxeh/rpi-firmware#7

@P33M
Copy link
Contributor

P33M commented May 5, 2014

CONFIG_COMPACTION and CONFIG_SLUB have fixed this in all but pathological cases.

@P33M P33M closed this as completed May 5, 2014
@fedeaf
Copy link

fedeaf commented May 24, 2014

IMO, this is not fixed at all. I keep getting this error consistently on the latest firmware with smc95xx.turbo_mode=N and vm.min_free_kbytes = 32768. This is on a headless 512MB raspi with gpu_mem = 16.
@P33M Please, reopen the ticket until it's definitely fixed.

@P33M
Copy link
Contributor

P33M commented May 24, 2014

Please post complete logs.

@fedeaf
Copy link

fedeaf commented May 24, 2014

I've just taken the dmesg output here:
https://gist.github.com/fedeaf/4f1ffef2c1f09aa422c3

ATM, deluge is downloading one torrent at ~400KBps and I was streaming through samba a video with an average of 350KB/s to my laptop.

@P33M
Copy link
Contributor

P33M commented May 24, 2014

Is there any indication that the functionality when these messages are being generated is impaired? I.e. do you see skipped video or corrupt fragment downloads?

@fedeaf
Copy link

fedeaf commented May 24, 2014

Yes, both are true.
Video stutter is common and downloads are usually damaged.
$ free -h
total used free shared buffers cached
Mem: 485M 468M 16M 528K 140M 234M
-/+ buffers/cache: 93M 391M
Swap: 989M 0B 989M

@P33M
Copy link
Contributor

P33M commented May 24, 2014

Can you post the output of cat /proc/buddyinfo (over several times when the network is active)?

@fedeaf
Copy link

fedeaf commented May 24, 2014

Here is buddyinfo every 3 secs, while streaming a video and deluge also running:
https://gist.github.com/fedeaf/84313d2a0a29d01df2b5

Just before that I drop_caches, that's why there was some free memory at the start.

@P33M
Copy link
Contributor

P33M commented May 27, 2014

Normal downloads may end up corrupt if there is a smsc95xx driver bug. If it doesn't correctly report broken transfers in the case of a memory allocation failure, then that's an upstream bug. As we are pretty much the only hardcore users of this hardware then it'll be up to us to diagnose and fix anyway.

Torrents use chunk hash checks to ensure the validity of data transferred. Are you seeing corrupt torrent downloads?

Is the problem worse if you use smsc95xx.turbo_mode=Y?

@fedeaf
Copy link

fedeaf commented May 27, 2014

I've seen corrupted torrents but that hasn't happened for sometime now, might have been related to something else.
I should test more but smsc95xx.turbo_mode seems to have no effect on this (at least for me).

Without any torrents downloading, I can happily stream a video through samba to my laptop and have no stutter even if I see some kevent2 on the logs. While I'm downloading something (even at low speeds ~ 300KB/s) streaming becomes totally impossible after some time. It seems to be related to writing to the USB hdd (ntfs).

Could this cause corruption at the time dumping data from ram to usb-hdd?

@P33M
Copy link
Contributor

P33M commented May 29, 2014

You are running out of CPU time.

  • NTFS file access is CPU-heavy (userspace filesystem driver)
  • SAMBA access is CPU-heavy
  • Torrenting is CPU-heavy

At the least, reformat the external drive as ext4. This will reduce the CPU time on file access substantially.

If you get corrupt torrent downloads without streaming and it appears dependent on "kevent 2" messages, then it warrants further investigation.

@fedeaf
Copy link

fedeaf commented May 29, 2014

No, my raspi is OCed and has plenty of CPU available while doing that.
I also have kevent 2s even if there are no torrents or writes to the disk and I'm just streaming through samba. I'm pretty sure that the issue is not related to CPU load.
Will try to prove it somehow and hopefully getting samba out of the way too.

@JamesH65
Copy link
Contributor

Whilst doing the above, run 'top' in a terminal to see the CPU usage. I think that even when overclocked those operations will use a LOT of CPU.

@Tuinslak
Copy link

Tuinslak commented Feb 7, 2016

Same is happening on an RPI2. I have 2 shares mounted as NFS (ext4) and two other servers running rdiff-backup to the Pi (stores the files via NFS).

I have plenty of CPU time left, according to htop. RPI slightly overclocked.

screen shot 2016-02-07 at 20 51 32

screen shot 2016-02-07 at 20 51 55

@CRTX
Copy link

CRTX commented Mar 28, 2016

Why is this closed? This is still happening in my Raspberry Pi 3 with latest firmware 4.1.20-v7+ except with "kevent 0 may have been dropped"

@ZeBadger
Copy link

I think the problem was resolved by switching to a different SD card.

@MrColdbird
Copy link

Problem is still there, but I fear this issue rests elsewhere and not with the Raspberry pi.

The same issue occurs on regular desktop linux distributions (x86 / x86-64) whenever a network interface is hammered with a huge amount of concurrent connections.

Not only is my Raspberry pi affected, but so is my x86-64 based homemade linux router.
This is most likely a kernel issue of some kind, but given the huge time that has passed since its first discovery, and the fact that there still isn't a fix out, makes me assume that this issue doesn't rank high in the todo list of the linux kernel developers (which makes me sad).

popcornmix pushed a commit that referenced this issue Feb 26, 2018
syzbot reported a division by 0 bug in the netfilter nat code:

divide error: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4168 Comm: syzkaller034710 Not tainted 4.16.0-rc1+ #309
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:nf_nat_l4proto_unique_tuple+0x291/0x530
net/netfilter/nf_nat_proto_common.c:88
RSP: 0018:ffff8801b2466778 EFLAGS: 00010246
RAX: 000000000000f153 RBX: ffff8801b2466dd8 RCX: ffff8801b2466c7c
RDX: 0000000000000000 RSI: ffff8801b2466c58 RDI: ffff8801db5293ac
RBP: ffff8801b24667d8 R08: ffff8801b8ba6dc0 R09: ffffffff88af5900
R10: ffff8801b24666f0 R11: 0000000000000000 R12: 000000002990f153
R13: 0000000000000001 R14: 0000000000000000 R15: ffff8801b2466c7c
FS:  00000000017e3880(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000208fdfe4 CR3: 00000001b5340002 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  dccp_unique_tuple+0x40/0x50 net/netfilter/nf_nat_proto_dccp.c:30
  get_unique_tuple+0xc28/0x1c10 net/netfilter/nf_nat_core.c:362
  nf_nat_setup_info+0x1c2/0xe00 net/netfilter/nf_nat_core.c:406
  nf_nat_redirect_ipv6+0x306/0x730 net/netfilter/nf_nat_redirect.c:124
  redirect_tg6+0x7f/0xb0 net/netfilter/xt_REDIRECT.c:34
  ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365
  ip6table_nat_do_chain+0x65/0x80 net/ipv6/netfilter/ip6table_nat.c:41
  nf_nat_ipv6_fn+0x594/0xa80 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:302
  nf_nat_ipv6_local_fn+0x33/0x5d0
net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:407
  ip6table_nat_local_fn+0x2c/0x40 net/ipv6/netfilter/ip6table_nat.c:69
  nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline]
  nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483
  nf_hook include/linux/netfilter.h:243 [inline]
  NF_HOOK include/linux/netfilter.h:286 [inline]
  ip6_xmit+0x10ec/0x2260 net/ipv6/ip6_output.c:277
  inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139
  dccp_transmit_skb+0x9ac/0x10f0 net/dccp/output.c:142
  dccp_connect+0x369/0x670 net/dccp/output.c:564
  dccp_v6_connect+0xe17/0x1bf0 net/dccp/ipv6.c:946
  __inet_stream_connect+0x2d4/0xf00 net/ipv4/af_inet.c:620
  inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:684
  SYSC_connect+0x213/0x4a0 net/socket.c:1639
  SyS_connect+0x24/0x30 net/socket.c:1620
  do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x26/0x9b
RIP: 0033:0x441c69
RSP: 002b:00007ffe50cc0be8 EFLAGS: 00000217 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000441c69
RDX: 000000000000001c RSI: 00000000208fdfe4 RDI: 0000000000000003
RBP: 00000000006cc018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000538 R11: 0000000000000217 R12: 0000000000403590
R13: 0000000000403620 R14: 0000000000000000 R15: 0000000000000000
Code: 48 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 46 02 00 00 48 8b
45 c8 44 0f b7 20 e8 88 97 04 fd 31 d2 41 0f b7 c4 4c 89 f9 <41> f7 f6 48
c1 e9 03 48 b8 00 00 00 00 00 fc ff df 0f b6 0c 01
RIP: nf_nat_l4proto_unique_tuple+0x291/0x530
net/netfilter/nf_nat_proto_common.c:88 RSP: ffff8801b2466778

The problem is that currently we don't have any check on the
configured port range. A port range == -1 triggers the bug, while
other negative values may require a very long time to complete the
following loop.

This commit addresses the issue swapping the two ends on negative
ranges. The check is performed in nf_nat_l4proto_unique_tuple() since
the nft nat loads the port values from nft registers at runtime.

v1 -> v2: use the correct 'Fixes' tag
v2 -> v3: update commit message, drop unneeded READ_ONCE()

Fixes: 5b1158e ("[NETFILTER]: Add NAT support for nf_conntrack")
Reported-by: syzbot+8012e198bd037f4871e5@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nathanchance pushed a commit to nathanchance/pi-kernel that referenced this issue Mar 15, 2018
commit db57ccf upstream.

syzbot reported a division by 0 bug in the netfilter nat code:

divide error: 0000 [raspberrypi#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4168 Comm: syzkaller034710 Not tainted 4.16.0-rc1+ raspberrypi#309
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:nf_nat_l4proto_unique_tuple+0x291/0x530
net/netfilter/nf_nat_proto_common.c:88
RSP: 0018:ffff8801b2466778 EFLAGS: 00010246
RAX: 000000000000f153 RBX: ffff8801b2466dd8 RCX: ffff8801b2466c7c
RDX: 0000000000000000 RSI: ffff8801b2466c58 RDI: ffff8801db5293ac
RBP: ffff8801b24667d8 R08: ffff8801b8ba6dc0 R09: ffffffff88af5900
R10: ffff8801b24666f0 R11: 0000000000000000 R12: 000000002990f153
R13: 0000000000000001 R14: 0000000000000000 R15: ffff8801b2466c7c
FS:  00000000017e3880(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000208fdfe4 CR3: 00000001b5340002 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  dccp_unique_tuple+0x40/0x50 net/netfilter/nf_nat_proto_dccp.c:30
  get_unique_tuple+0xc28/0x1c10 net/netfilter/nf_nat_core.c:362
  nf_nat_setup_info+0x1c2/0xe00 net/netfilter/nf_nat_core.c:406
  nf_nat_redirect_ipv6+0x306/0x730 net/netfilter/nf_nat_redirect.c:124
  redirect_tg6+0x7f/0xb0 net/netfilter/xt_REDIRECT.c:34
  ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365
  ip6table_nat_do_chain+0x65/0x80 net/ipv6/netfilter/ip6table_nat.c:41
  nf_nat_ipv6_fn+0x594/0xa80 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:302
  nf_nat_ipv6_local_fn+0x33/0x5d0
net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:407
  ip6table_nat_local_fn+0x2c/0x40 net/ipv6/netfilter/ip6table_nat.c:69
  nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline]
  nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483
  nf_hook include/linux/netfilter.h:243 [inline]
  NF_HOOK include/linux/netfilter.h:286 [inline]
  ip6_xmit+0x10ec/0x2260 net/ipv6/ip6_output.c:277
  inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139
  dccp_transmit_skb+0x9ac/0x10f0 net/dccp/output.c:142
  dccp_connect+0x369/0x670 net/dccp/output.c:564
  dccp_v6_connect+0xe17/0x1bf0 net/dccp/ipv6.c:946
  __inet_stream_connect+0x2d4/0xf00 net/ipv4/af_inet.c:620
  inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:684
  SYSC_connect+0x213/0x4a0 net/socket.c:1639
  SyS_connect+0x24/0x30 net/socket.c:1620
  do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x26/0x9b
RIP: 0033:0x441c69
RSP: 002b:00007ffe50cc0be8 EFLAGS: 00000217 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000441c69
RDX: 000000000000001c RSI: 00000000208fdfe4 RDI: 0000000000000003
RBP: 00000000006cc018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000538 R11: 0000000000000217 R12: 0000000000403590
R13: 0000000000403620 R14: 0000000000000000 R15: 0000000000000000
Code: 48 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 46 02 00 00 48 8b
45 c8 44 0f b7 20 e8 88 97 04 fd 31 d2 41 0f b7 c4 4c 89 f9 <41> f7 f6 48
c1 e9 03 48 b8 00 00 00 00 00 fc ff df 0f b6 0c 01
RIP: nf_nat_l4proto_unique_tuple+0x291/0x530
net/netfilter/nf_nat_proto_common.c:88 RSP: ffff8801b2466778

The problem is that currently we don't have any check on the
configured port range. A port range == -1 triggers the bug, while
other negative values may require a very long time to complete the
following loop.

This commit addresses the issue swapping the two ends on negative
ranges. The check is performed in nf_nat_l4proto_unique_tuple() since
the nft nat loads the port values from nft registers at runtime.

v1 -> v2: use the correct 'Fixes' tag
v2 -> v3: update commit message, drop unneeded READ_ONCE()

Fixes: 5b1158e ("[NETFILTER]: Add NAT support for nf_conntrack")
Reported-by: syzbot+8012e198bd037f4871e5@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Mar 16, 2018
commit db57ccf upstream.

syzbot reported a division by 0 bug in the netfilter nat code:

divide error: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4168 Comm: syzkaller034710 Not tainted 4.16.0-rc1+ #309
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:nf_nat_l4proto_unique_tuple+0x291/0x530
net/netfilter/nf_nat_proto_common.c:88
RSP: 0018:ffff8801b2466778 EFLAGS: 00010246
RAX: 000000000000f153 RBX: ffff8801b2466dd8 RCX: ffff8801b2466c7c
RDX: 0000000000000000 RSI: ffff8801b2466c58 RDI: ffff8801db5293ac
RBP: ffff8801b24667d8 R08: ffff8801b8ba6dc0 R09: ffffffff88af5900
R10: ffff8801b24666f0 R11: 0000000000000000 R12: 000000002990f153
R13: 0000000000000001 R14: 0000000000000000 R15: ffff8801b2466c7c
FS:  00000000017e3880(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000208fdfe4 CR3: 00000001b5340002 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  dccp_unique_tuple+0x40/0x50 net/netfilter/nf_nat_proto_dccp.c:30
  get_unique_tuple+0xc28/0x1c10 net/netfilter/nf_nat_core.c:362
  nf_nat_setup_info+0x1c2/0xe00 net/netfilter/nf_nat_core.c:406
  nf_nat_redirect_ipv6+0x306/0x730 net/netfilter/nf_nat_redirect.c:124
  redirect_tg6+0x7f/0xb0 net/netfilter/xt_REDIRECT.c:34
  ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365
  ip6table_nat_do_chain+0x65/0x80 net/ipv6/netfilter/ip6table_nat.c:41
  nf_nat_ipv6_fn+0x594/0xa80 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:302
  nf_nat_ipv6_local_fn+0x33/0x5d0
net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:407
  ip6table_nat_local_fn+0x2c/0x40 net/ipv6/netfilter/ip6table_nat.c:69
  nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline]
  nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483
  nf_hook include/linux/netfilter.h:243 [inline]
  NF_HOOK include/linux/netfilter.h:286 [inline]
  ip6_xmit+0x10ec/0x2260 net/ipv6/ip6_output.c:277
  inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139
  dccp_transmit_skb+0x9ac/0x10f0 net/dccp/output.c:142
  dccp_connect+0x369/0x670 net/dccp/output.c:564
  dccp_v6_connect+0xe17/0x1bf0 net/dccp/ipv6.c:946
  __inet_stream_connect+0x2d4/0xf00 net/ipv4/af_inet.c:620
  inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:684
  SYSC_connect+0x213/0x4a0 net/socket.c:1639
  SyS_connect+0x24/0x30 net/socket.c:1620
  do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x26/0x9b
RIP: 0033:0x441c69
RSP: 002b:00007ffe50cc0be8 EFLAGS: 00000217 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000441c69
RDX: 000000000000001c RSI: 00000000208fdfe4 RDI: 0000000000000003
RBP: 00000000006cc018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000538 R11: 0000000000000217 R12: 0000000000403590
R13: 0000000000403620 R14: 0000000000000000 R15: 0000000000000000
Code: 48 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 46 02 00 00 48 8b
45 c8 44 0f b7 20 e8 88 97 04 fd 31 d2 41 0f b7 c4 4c 89 f9 <41> f7 f6 48
c1 e9 03 48 b8 00 00 00 00 00 fc ff df 0f b6 0c 01
RIP: nf_nat_l4proto_unique_tuple+0x291/0x530
net/netfilter/nf_nat_proto_common.c:88 RSP: ffff8801b2466778

The problem is that currently we don't have any check on the
configured port range. A port range == -1 triggers the bug, while
other negative values may require a very long time to complete the
following loop.

This commit addresses the issue swapping the two ends on negative
ranges. The check is performed in nf_nat_l4proto_unique_tuple() since
the nft nat loads the port values from nft registers at runtime.

v1 -> v2: use the correct 'Fixes' tag
v2 -> v3: update commit message, drop unneeded READ_ONCE()

Fixes: 5b1158e ("[NETFILTER]: Add NAT support for nf_conntrack")
Reported-by: syzbot+8012e198bd037f4871e5@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Mar 22, 2018
commit db57ccf upstream.

syzbot reported a division by 0 bug in the netfilter nat code:

divide error: 0000 [raspberrypi#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4168 Comm: syzkaller034710 Not tainted 4.16.0-rc1+ raspberrypi#309
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:nf_nat_l4proto_unique_tuple+0x291/0x530
net/netfilter/nf_nat_proto_common.c:88
RSP: 0018:ffff8801b2466778 EFLAGS: 00010246
RAX: 000000000000f153 RBX: ffff8801b2466dd8 RCX: ffff8801b2466c7c
RDX: 0000000000000000 RSI: ffff8801b2466c58 RDI: ffff8801db5293ac
RBP: ffff8801b24667d8 R08: ffff8801b8ba6dc0 R09: ffffffff88af5900
R10: ffff8801b24666f0 R11: 0000000000000000 R12: 000000002990f153
R13: 0000000000000001 R14: 0000000000000000 R15: ffff8801b2466c7c
FS:  00000000017e3880(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000208fdfe4 CR3: 00000001b5340002 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  dccp_unique_tuple+0x40/0x50 net/netfilter/nf_nat_proto_dccp.c:30
  get_unique_tuple+0xc28/0x1c10 net/netfilter/nf_nat_core.c:362
  nf_nat_setup_info+0x1c2/0xe00 net/netfilter/nf_nat_core.c:406
  nf_nat_redirect_ipv6+0x306/0x730 net/netfilter/nf_nat_redirect.c:124
  redirect_tg6+0x7f/0xb0 net/netfilter/xt_REDIRECT.c:34
  ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365
  ip6table_nat_do_chain+0x65/0x80 net/ipv6/netfilter/ip6table_nat.c:41
  nf_nat_ipv6_fn+0x594/0xa80 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:302
  nf_nat_ipv6_local_fn+0x33/0x5d0
net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:407
  ip6table_nat_local_fn+0x2c/0x40 net/ipv6/netfilter/ip6table_nat.c:69
  nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline]
  nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483
  nf_hook include/linux/netfilter.h:243 [inline]
  NF_HOOK include/linux/netfilter.h:286 [inline]
  ip6_xmit+0x10ec/0x2260 net/ipv6/ip6_output.c:277
  inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139
  dccp_transmit_skb+0x9ac/0x10f0 net/dccp/output.c:142
  dccp_connect+0x369/0x670 net/dccp/output.c:564
  dccp_v6_connect+0xe17/0x1bf0 net/dccp/ipv6.c:946
  __inet_stream_connect+0x2d4/0xf00 net/ipv4/af_inet.c:620
  inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:684
  SYSC_connect+0x213/0x4a0 net/socket.c:1639
  SyS_connect+0x24/0x30 net/socket.c:1620
  do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x26/0x9b
RIP: 0033:0x441c69
RSP: 002b:00007ffe50cc0be8 EFLAGS: 00000217 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000441c69
RDX: 000000000000001c RSI: 00000000208fdfe4 RDI: 0000000000000003
RBP: 00000000006cc018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000538 R11: 0000000000000217 R12: 0000000000403590
R13: 0000000000403620 R14: 0000000000000000 R15: 0000000000000000
Code: 48 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 46 02 00 00 48 8b
45 c8 44 0f b7 20 e8 88 97 04 fd 31 d2 41 0f b7 c4 4c 89 f9 <41> f7 f6 48
c1 e9 03 48 b8 00 00 00 00 00 fc ff df 0f b6 0c 01
RIP: nf_nat_l4proto_unique_tuple+0x291/0x530
net/netfilter/nf_nat_proto_common.c:88 RSP: ffff8801b2466778

The problem is that currently we don't have any check on the
configured port range. A port range == -1 triggers the bug, while
other negative values may require a very long time to complete the
following loop.

This commit addresses the issue swapping the two ends on negative
ranges. The check is performed in nf_nat_l4proto_unique_tuple() since
the nft nat loads the port values from nft registers at runtime.

v1 -> v2: use the correct 'Fixes' tag
v2 -> v3: update commit message, drop unneeded READ_ONCE()

Fixes: 5b1158e ("[NETFILTER]: Add NAT support for nf_conntrack")
Reported-by: syzbot+8012e198bd037f4871e5@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Jul 21, 2021
…format

[ Upstream commit 06d213d ]

For incoming SCO connection with transparent coding format, alt setting
of CVSD is getting applied instead of Transparent.

Before fix:
< HCI Command: Accept Synchron.. (0x01|0x0029) plen 21  #2196 [hci0] 321.342548
        Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
        Transmit bandwidth: 8000
        Receive bandwidth: 8000
        Max latency: 13
        Setting: 0x0003
          Input Coding: Linear
          Input Data Format: 1's complement
          Input Sample Size: 8-bit
          # of bits padding at MSB: 0
          Air Coding Format: Transparent Data
        Retransmission effort: Optimize for link quality (0x02)
        Packet type: 0x003f
          HV1 may be used
          HV2 may be used
          HV3 may be used
          EV3 may be used
          EV4 may be used
          EV5 may be used
> HCI Event: Command Status (0x0f) plen 4               #2197 [hci0] 321.343585
      Accept Synchronous Connection Request (0x01|0x0029) ncmd 1
        Status: Success (0x00)
> HCI Event: Synchronous Connect Comp.. (0x2c) plen 17  #2198 [hci0] 321.351666
        Status: Success (0x00)
        Handle: 257
        Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
        Link type: eSCO (0x02)
        Transmission interval: 0x0c
        Retransmission window: 0x04
        RX packet length: 60
        TX packet length: 60
        Air mode: Transparent (0x03)
........
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2336 [hci0] 321.383655
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2337 [hci0] 321.389558
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2338 [hci0] 321.393615
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2339 [hci0] 321.393618
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2340 [hci0] 321.393618
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2341 [hci0] 321.397070
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2342 [hci0] 321.403622
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2343 [hci0] 321.403625
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2344 [hci0] 321.403625
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2345 [hci0] 321.403625
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2346 [hci0] 321.404569
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2347 [hci0] 321.412091
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2348 [hci0] 321.413626
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2349 [hci0] 321.413630
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2350 [hci0] 321.413630
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2351 [hci0] 321.419674

After fix:

< HCI Command: Accept Synchronou.. (0x01|0x0029) plen 21  #309 [hci0] 49.439693
        Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
        Transmit bandwidth: 8000
        Receive bandwidth: 8000
        Max latency: 13
        Setting: 0x0003
          Input Coding: Linear
          Input Data Format: 1's complement
          Input Sample Size: 8-bit
          # of bits padding at MSB: 0
          Air Coding Format: Transparent Data
        Retransmission effort: Optimize for link quality (0x02)
        Packet type: 0x003f
          HV1 may be used
          HV2 may be used
          HV3 may be used
          EV3 may be used
          EV4 may be used
          EV5 may be used
> HCI Event: Command Status (0x0f) plen 4                 #310 [hci0] 49.440308
      Accept Synchronous Connection Request (0x01|0x0029) ncmd 1
        Status: Success (0x00)
> HCI Event: Synchronous Connect Complete (0x2c) plen 17  #311 [hci0] 49.449308
        Status: Success (0x00)
        Handle: 257
        Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
        Link type: eSCO (0x02)
        Transmission interval: 0x0c
        Retransmission window: 0x04
        RX packet length: 60
        TX packet length: 60
        Air mode: Transparent (0x03)
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #312 [hci0] 49.450421
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #313 [hci0] 49.457927
> HCI Event: Max Slots Change (0x1b) plen 3               #314 [hci0] 49.460345
        Handle: 256
        Max slots: 5
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #315 [hci0] 49.465453
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #316 [hci0] 49.470502
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #317 [hci0] 49.470519
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #318 [hci0] 49.472996
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #319 [hci0] 49.480412
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #320 [hci0] 49.480492
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #321 [hci0] 49.487989
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #322 [hci0] 49.490303
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #323 [hci0] 49.495496
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #324 [hci0] 49.500304
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #325 [hci0] 49.500311

Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Lokendra Singh <lokendra.singh@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Jul 22, 2021
…format

[ Upstream commit 06d213d ]

For incoming SCO connection with transparent coding format, alt setting
of CVSD is getting applied instead of Transparent.

Before fix:
< HCI Command: Accept Synchron.. (0x01|0x0029) plen 21  #2196 [hci0] 321.342548
        Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
        Transmit bandwidth: 8000
        Receive bandwidth: 8000
        Max latency: 13
        Setting: 0x0003
          Input Coding: Linear
          Input Data Format: 1's complement
          Input Sample Size: 8-bit
          # of bits padding at MSB: 0
          Air Coding Format: Transparent Data
        Retransmission effort: Optimize for link quality (0x02)
        Packet type: 0x003f
          HV1 may be used
          HV2 may be used
          HV3 may be used
          EV3 may be used
          EV4 may be used
          EV5 may be used
> HCI Event: Command Status (0x0f) plen 4               #2197 [hci0] 321.343585
      Accept Synchronous Connection Request (0x01|0x0029) ncmd 1
        Status: Success (0x00)
> HCI Event: Synchronous Connect Comp.. (0x2c) plen 17  #2198 [hci0] 321.351666
        Status: Success (0x00)
        Handle: 257
        Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
        Link type: eSCO (0x02)
        Transmission interval: 0x0c
        Retransmission window: 0x04
        RX packet length: 60
        TX packet length: 60
        Air mode: Transparent (0x03)
........
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2336 [hci0] 321.383655
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2337 [hci0] 321.389558
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2338 [hci0] 321.393615
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2339 [hci0] 321.393618
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2340 [hci0] 321.393618
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2341 [hci0] 321.397070
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2342 [hci0] 321.403622
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2343 [hci0] 321.403625
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2344 [hci0] 321.403625
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2345 [hci0] 321.403625
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2346 [hci0] 321.404569
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2347 [hci0] 321.412091
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2348 [hci0] 321.413626
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2349 [hci0] 321.413630
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2350 [hci0] 321.413630
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2351 [hci0] 321.419674

After fix:

< HCI Command: Accept Synchronou.. (0x01|0x0029) plen 21  #309 [hci0] 49.439693
        Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
        Transmit bandwidth: 8000
        Receive bandwidth: 8000
        Max latency: 13
        Setting: 0x0003
          Input Coding: Linear
          Input Data Format: 1's complement
          Input Sample Size: 8-bit
          # of bits padding at MSB: 0
          Air Coding Format: Transparent Data
        Retransmission effort: Optimize for link quality (0x02)
        Packet type: 0x003f
          HV1 may be used
          HV2 may be used
          HV3 may be used
          EV3 may be used
          EV4 may be used
          EV5 may be used
> HCI Event: Command Status (0x0f) plen 4                 #310 [hci0] 49.440308
      Accept Synchronous Connection Request (0x01|0x0029) ncmd 1
        Status: Success (0x00)
> HCI Event: Synchronous Connect Complete (0x2c) plen 17  #311 [hci0] 49.449308
        Status: Success (0x00)
        Handle: 257
        Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
        Link type: eSCO (0x02)
        Transmission interval: 0x0c
        Retransmission window: 0x04
        RX packet length: 60
        TX packet length: 60
        Air mode: Transparent (0x03)
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #312 [hci0] 49.450421
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #313 [hci0] 49.457927
> HCI Event: Max Slots Change (0x1b) plen 3               #314 [hci0] 49.460345
        Handle: 256
        Max slots: 5
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #315 [hci0] 49.465453
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #316 [hci0] 49.470502
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #317 [hci0] 49.470519
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #318 [hci0] 49.472996
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #319 [hci0] 49.480412
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #320 [hci0] 49.480492
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #321 [hci0] 49.487989
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #322 [hci0] 49.490303
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #323 [hci0] 49.495496
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #324 [hci0] 49.500304
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #325 [hci0] 49.500311

Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Lokendra Singh <lokendra.singh@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Jul 27, 2021
…format

[ Upstream commit 06d213d ]

For incoming SCO connection with transparent coding format, alt setting
of CVSD is getting applied instead of Transparent.

Before fix:
< HCI Command: Accept Synchron.. (0x01|0x0029) plen 21  #2196 [hci0] 321.342548
        Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
        Transmit bandwidth: 8000
        Receive bandwidth: 8000
        Max latency: 13
        Setting: 0x0003
          Input Coding: Linear
          Input Data Format: 1's complement
          Input Sample Size: 8-bit
          # of bits padding at MSB: 0
          Air Coding Format: Transparent Data
        Retransmission effort: Optimize for link quality (0x02)
        Packet type: 0x003f
          HV1 may be used
          HV2 may be used
          HV3 may be used
          EV3 may be used
          EV4 may be used
          EV5 may be used
> HCI Event: Command Status (0x0f) plen 4               #2197 [hci0] 321.343585
      Accept Synchronous Connection Request (0x01|0x0029) ncmd 1
        Status: Success (0x00)
> HCI Event: Synchronous Connect Comp.. (0x2c) plen 17  #2198 [hci0] 321.351666
        Status: Success (0x00)
        Handle: 257
        Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
        Link type: eSCO (0x02)
        Transmission interval: 0x0c
        Retransmission window: 0x04
        RX packet length: 60
        TX packet length: 60
        Air mode: Transparent (0x03)
........
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2336 [hci0] 321.383655
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2337 [hci0] 321.389558
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2338 [hci0] 321.393615
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2339 [hci0] 321.393618
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2340 [hci0] 321.393618
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2341 [hci0] 321.397070
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2342 [hci0] 321.403622
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2343 [hci0] 321.403625
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2344 [hci0] 321.403625
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2345 [hci0] 321.403625
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2346 [hci0] 321.404569
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2347 [hci0] 321.412091
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2348 [hci0] 321.413626
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2349 [hci0] 321.413630
> SCO Data RX: Handle 257 flags 0x00 dlen 48            #2350 [hci0] 321.413630
< SCO Data TX: Handle 257 flags 0x00 dlen 60            #2351 [hci0] 321.419674

After fix:

< HCI Command: Accept Synchronou.. (0x01|0x0029) plen 21  #309 [hci0] 49.439693
        Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
        Transmit bandwidth: 8000
        Receive bandwidth: 8000
        Max latency: 13
        Setting: 0x0003
          Input Coding: Linear
          Input Data Format: 1's complement
          Input Sample Size: 8-bit
          # of bits padding at MSB: 0
          Air Coding Format: Transparent Data
        Retransmission effort: Optimize for link quality (0x02)
        Packet type: 0x003f
          HV1 may be used
          HV2 may be used
          HV3 may be used
          EV3 may be used
          EV4 may be used
          EV5 may be used
> HCI Event: Command Status (0x0f) plen 4                 #310 [hci0] 49.440308
      Accept Synchronous Connection Request (0x01|0x0029) ncmd 1
        Status: Success (0x00)
> HCI Event: Synchronous Connect Complete (0x2c) plen 17  #311 [hci0] 49.449308
        Status: Success (0x00)
        Handle: 257
        Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
        Link type: eSCO (0x02)
        Transmission interval: 0x0c
        Retransmission window: 0x04
        RX packet length: 60
        TX packet length: 60
        Air mode: Transparent (0x03)
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #312 [hci0] 49.450421
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #313 [hci0] 49.457927
> HCI Event: Max Slots Change (0x1b) plen 3               #314 [hci0] 49.460345
        Handle: 256
        Max slots: 5
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #315 [hci0] 49.465453
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #316 [hci0] 49.470502
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #317 [hci0] 49.470519
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #318 [hci0] 49.472996
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #319 [hci0] 49.480412
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #320 [hci0] 49.480492
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #321 [hci0] 49.487989
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #322 [hci0] 49.490303
< SCO Data TX: Handle 257 flags 0x00 dlen 60              #323 [hci0] 49.495496
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #324 [hci0] 49.500304
> SCO Data RX: Handle 257 flags 0x00 dlen 60              #325 [hci0] 49.500311

Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Lokendra Singh <lokendra.singh@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Aug 23, 2022
While playing with event probes (eprobes), I tried to see what would
happen if I attempted to retrieve the instruction pointer (%rip) knowing
that event probes do not use pt_regs. The result was:

 BUG: kernel NULL pointer dereference, address: 0000000000000024
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 1 PID: 1847 Comm: trace-cmd Not tainted 5.19.0-rc5-test+ #309
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
 RIP: 0010:get_event_field.isra.0+0x0/0x50
 Code: ff 48 c7 c7 c0 8f 74 a1 e8 3d 8b f5 ff e8 88 09 f6 ff 4c 89 e7 e8
50 6a 13 00 48 89 ef 5b 5d 41 5c 41 5d e9 42 6a 13 00 66 90 <48> 63 47 24
8b 57 2c 48 01 c6 8b 47 28 83 f8 02 74 0e 83 f8 04 74
 RSP: 0018:ffff916c394bbaf0 EFLAGS: 00010086
 RAX: ffff916c854041d8 RBX: ffff916c8d9fbf50 RCX: ffff916c255d2000
 RDX: 0000000000000000 RSI: ffff916c255d2008 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff916c3a2a0c08 R09: ffff916c394bbda8
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff916c854041d8
 R13: ffff916c854041b0 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff916c9ea40000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000024 CR3: 000000011b60a002 CR4: 00000000001706e0
 Call Trace:
  <TASK>
  get_eprobe_size+0xb4/0x640
  ? __mod_node_page_state+0x72/0xc0
  __eprobe_trace_func+0x59/0x1a0
  ? __mod_lruvec_page_state+0xaa/0x1b0
  ? page_remove_file_rmap+0x14/0x230
  ? page_remove_rmap+0xda/0x170
  event_triggers_call+0x52/0xe0
  trace_event_buffer_commit+0x18f/0x240
  trace_event_raw_event_sched_wakeup_template+0x7a/0xb0
  try_to_wake_up+0x260/0x4c0
  __wake_up_common+0x80/0x180
  __wake_up_common_lock+0x7c/0xc0
  do_notify_parent+0x1c9/0x2a0
  exit_notify+0x1a9/0x220
  do_exit+0x2ba/0x450
  do_group_exit+0x2d/0x90
  __x64_sys_exit_group+0x14/0x20
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

Obviously this is not the desired result.

Move the testing for TPARG_FL_TPOINT which is only used for event probes
to the top of the "$" variable check, as all the other variables are not
used for event probes. Also add a check in the register parsing "%" to
fail if an event probe is used.

Link: https://lkml.kernel.org/r/20220820134400.564426983@goodmis.org

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 7491e2c ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
herrnst pushed a commit to herrnst/linux-raspberrypi that referenced this issue Aug 25, 2022
commit 2673c60 upstream.

While playing with event probes (eprobes), I tried to see what would
happen if I attempted to retrieve the instruction pointer (%rip) knowing
that event probes do not use pt_regs. The result was:

 BUG: kernel NULL pointer dereference, address: 0000000000000024
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [raspberrypi#1] PREEMPT SMP PTI
 CPU: 1 PID: 1847 Comm: trace-cmd Not tainted 5.19.0-rc5-test+ raspberrypi#309
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
 RIP: 0010:get_event_field.isra.0+0x0/0x50
 Code: ff 48 c7 c7 c0 8f 74 a1 e8 3d 8b f5 ff e8 88 09 f6 ff 4c 89 e7 e8
50 6a 13 00 48 89 ef 5b 5d 41 5c 41 5d e9 42 6a 13 00 66 90 <48> 63 47 24
8b 57 2c 48 01 c6 8b 47 28 83 f8 02 74 0e 83 f8 04 74
 RSP: 0018:ffff916c394bbaf0 EFLAGS: 00010086
 RAX: ffff916c854041d8 RBX: ffff916c8d9fbf50 RCX: ffff916c255d2000
 RDX: 0000000000000000 RSI: ffff916c255d2008 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff916c3a2a0c08 R09: ffff916c394bbda8
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff916c854041d8
 R13: ffff916c854041b0 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff916c9ea40000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000024 CR3: 000000011b60a002 CR4: 00000000001706e0
 Call Trace:
  <TASK>
  get_eprobe_size+0xb4/0x640
  ? __mod_node_page_state+0x72/0xc0
  __eprobe_trace_func+0x59/0x1a0
  ? __mod_lruvec_page_state+0xaa/0x1b0
  ? page_remove_file_rmap+0x14/0x230
  ? page_remove_rmap+0xda/0x170
  event_triggers_call+0x52/0xe0
  trace_event_buffer_commit+0x18f/0x240
  trace_event_raw_event_sched_wakeup_template+0x7a/0xb0
  try_to_wake_up+0x260/0x4c0
  __wake_up_common+0x80/0x180
  __wake_up_common_lock+0x7c/0xc0
  do_notify_parent+0x1c9/0x2a0
  exit_notify+0x1a9/0x220
  do_exit+0x2ba/0x450
  do_group_exit+0x2d/0x90
  __x64_sys_exit_group+0x14/0x20
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

Obviously this is not the desired result.

Move the testing for TPARG_FL_TPOINT which is only used for event probes
to the top of the "$" variable check, as all the other variables are not
used for event probes. Also add a check in the register parsing "%" to
fail if an event probe is used.

Link: https://lkml.kernel.org/r/20220820134400.564426983@goodmis.org

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 7491e2c ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pzqqt pushed a commit to Pzqqt/kernel_raspberrypi_4b that referenced this issue Sep 2, 2022
commit 2673c60 upstream.

While playing with event probes (eprobes), I tried to see what would
happen if I attempted to retrieve the instruction pointer (%rip) knowing
that event probes do not use pt_regs. The result was:

 BUG: kernel NULL pointer dereference, address: 0000000000000024
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [raspberrypi#1] PREEMPT SMP PTI
 CPU: 1 PID: 1847 Comm: trace-cmd Not tainted 5.19.0-rc5-test+ raspberrypi#309
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
 RIP: 0010:get_event_field.isra.0+0x0/0x50
 Code: ff 48 c7 c7 c0 8f 74 a1 e8 3d 8b f5 ff e8 88 09 f6 ff 4c 89 e7 e8
50 6a 13 00 48 89 ef 5b 5d 41 5c 41 5d e9 42 6a 13 00 66 90 <48> 63 47 24
8b 57 2c 48 01 c6 8b 47 28 83 f8 02 74 0e 83 f8 04 74
 RSP: 0018:ffff916c394bbaf0 EFLAGS: 00010086
 RAX: ffff916c854041d8 RBX: ffff916c8d9fbf50 RCX: ffff916c255d2000
 RDX: 0000000000000000 RSI: ffff916c255d2008 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff916c3a2a0c08 R09: ffff916c394bbda8
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff916c854041d8
 R13: ffff916c854041b0 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff916c9ea40000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000024 CR3: 000000011b60a002 CR4: 00000000001706e0
 Call Trace:
  <TASK>
  get_eprobe_size+0xb4/0x640
  ? __mod_node_page_state+0x72/0xc0
  __eprobe_trace_func+0x59/0x1a0
  ? __mod_lruvec_page_state+0xaa/0x1b0
  ? page_remove_file_rmap+0x14/0x230
  ? page_remove_rmap+0xda/0x170
  event_triggers_call+0x52/0xe0
  trace_event_buffer_commit+0x18f/0x240
  trace_event_raw_event_sched_wakeup_template+0x7a/0xb0
  try_to_wake_up+0x260/0x4c0
  __wake_up_common+0x80/0x180
  __wake_up_common_lock+0x7c/0xc0
  do_notify_parent+0x1c9/0x2a0
  exit_notify+0x1a9/0x220
  do_exit+0x2ba/0x450
  do_group_exit+0x2d/0x90
  __x64_sys_exit_group+0x14/0x20
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

Obviously this is not the desired result.

Move the testing for TPARG_FL_TPOINT which is only used for event probes
to the top of the "$" variable check, as all the other variables are not
used for event probes. Also add a check in the register parsing "%" to
fail if an event probe is used.

Link: https://lkml.kernel.org/r/20220820134400.564426983@goodmis.org

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 7491e2c ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Aug 14, 2023
If we try to emit an icmp error in response to a nonliner skb, we get

BUG: KASAN: slab-out-of-bounds in ip_compute_csum+0x134/0x220
Read of size 4 at addr ffff88811c50db00 by task iperf3/1691
CPU: 2 PID: 1691 Comm: iperf3 Not tainted 6.5.0-rc3+ #309
[..]
 kasan_report+0x105/0x140
 ip_compute_csum+0x134/0x220
 iptunnel_pmtud_build_icmp+0x554/0x1020
 skb_tunnel_check_pmtu+0x513/0xb80
 vxlan_xmit_one+0x139e/0x2ef0
 vxlan_xmit+0x1867/0x2760
 dev_hard_start_xmit+0x1ee/0x4f0
 br_dev_queue_push_xmit+0x4d1/0x660
 [..]

ip_compute_csum() cannot deal with nonlinear skbs, so avoid it.
After this change, splat is gone and iperf3 is no longer stuck.

Fixes: 4cb47a8 ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230803152653.29535-2-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
popcornmix pushed a commit that referenced this issue Aug 17, 2023
commit 6a7ac3d upstream.

If we try to emit an icmp error in response to a nonliner skb, we get

BUG: KASAN: slab-out-of-bounds in ip_compute_csum+0x134/0x220
Read of size 4 at addr ffff88811c50db00 by task iperf3/1691
CPU: 2 PID: 1691 Comm: iperf3 Not tainted 6.5.0-rc3+ #309
[..]
 kasan_report+0x105/0x140
 ip_compute_csum+0x134/0x220
 iptunnel_pmtud_build_icmp+0x554/0x1020
 skb_tunnel_check_pmtu+0x513/0xb80
 vxlan_xmit_one+0x139e/0x2ef0
 vxlan_xmit+0x1867/0x2760
 dev_hard_start_xmit+0x1ee/0x4f0
 br_dev_queue_push_xmit+0x4d1/0x660
 [..]

ip_compute_csum() cannot deal with nonlinear skbs, so avoid it.
After this change, splat is gone and iperf3 is no longer stuck.

Fixes: 4cb47a8 ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230803152653.29535-2-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Aug 21, 2023
commit 6a7ac3d upstream.

If we try to emit an icmp error in response to a nonliner skb, we get

BUG: KASAN: slab-out-of-bounds in ip_compute_csum+0x134/0x220
Read of size 4 at addr ffff88811c50db00 by task iperf3/1691
CPU: 2 PID: 1691 Comm: iperf3 Not tainted 6.5.0-rc3+ #309
[..]
 kasan_report+0x105/0x140
 ip_compute_csum+0x134/0x220
 iptunnel_pmtud_build_icmp+0x554/0x1020
 skb_tunnel_check_pmtu+0x513/0xb80
 vxlan_xmit_one+0x139e/0x2ef0
 vxlan_xmit+0x1867/0x2760
 dev_hard_start_xmit+0x1ee/0x4f0
 br_dev_queue_push_xmit+0x4d1/0x660
 [..]

ip_compute_csum() cannot deal with nonlinear skbs, so avoid it.
After this change, splat is gone and iperf3 is no longer stuck.

Fixes: 4cb47a8 ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230803152653.29535-2-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests