-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2 usb ethernet adapters: cable plug-in causes kernel-4.19.x to crash #2924
Comments
Please post a full dmesg listing. Running out of memory in this context is likely to be fatal to other consumers, not just USB. |
Thanks for your reply. Here the full dmesg. You'll see:
[ 0.000000] Booting Linux on physical CPU 0x0 This is rpi (Linux armv7l 4.19.32-v7+) 13:41:11 rpi login: root root@rpi ~ # systemctl poweroff |
I experience a similar problem since kernel 4.19.23 and newer. Everything works up to kernel 4.14.98. Tested setup:
What happens:
Test case:
|
Updates: (1) The issue (summary)
(2) Test with newer kernel Same issues with newest kernel/firmware 4.19.36 (was 4.19.32). (3) Low memory?
So, I did the following: (3.1) Checking memory with free command The free command before/after the first kernel error message does not reveal free [ 79.535682] asix 1-1.5:1.0 eth_inet: link up, 100Mbps, full-duplex, lpa 0xCDE1 free (3.2) Booting with nearly no user space daemon (no sshd, no apache, etc) Still the same issue. (4) New hints on what can trigger the issue To recall: In my OP I saw a work-around by not connecting one of the two One of the adapters is connected to a WLAN router. Now I see that I can I repeated that several times by rebooting the RPI, waiting an arbitrary time (4) Conclusion for now I cannot confirm a memory issue yet (always note that kernel 4.14.x works). I can trigger the error by the first network packets sent over one of the |
I can replicate with 2x Pi zero plugged into a 3B+ with the zero devices acting as CDC-ether gadgets. |
Oh this is fun. In 4.14.122, there are 3 allocations for non-aligned buffers across three Ethernet devices (lan95xx plus two cdc_ether devices). Each of these is a 64k-1b buffer. In 4.19, the number of allocations grows to 5, with 2 of the requests failing. Increasing the CMA region size doesn't help; even with a 32MB region the same number of failures occurs. There's two worrying implications here:
We can't do much about the first point, as there's a hardware requirement that can't be communicated up the stack. We can fix the second point by a) moving this alloc out of an atomic context and b) figuring out which page pool is getting exhausted after only 3 allocations. There's yet more fun in the failure mechanism: the allocation failure happens after dwc_otg has "accepted" the URB and returned success to the caller - silently failing to perform the actual transfer and never giving the URB back with a failure status. |
4.20 gains 1ee5c23 which signals up the stack that some part of the buffers can be aligned. Does that help in this context? |
Man, this is awesome. It seems we can finally upgrade the kernel from 4.14 to 4.19.! No error message, no crash since 20 minutes, and also some reboots. In case you want to drop some lines about what this actually changes, when |
Increasing the coherent pool size reduces total available RAM for other users, so this is a diagnostic test rather than a fix. This is a boot-time allocated pool of memory for DMA-coherent buffers allocated in atomic context. The way dwc_otg is using this is suboptimal - the allocations it uses are large and persist until the device is disconnected. The aligned buffer also incurs an extra memcpy to/from the provided buffer, so this also adds a performance penalty. Usbnet should make a better effort at allocating aligned RX buffers, since that's gone backwards since 4.14. |
Naively forcing NET_IP_ALIGN to be 0 stops RX buffers being misaligned, but ipv6 multicast tx is giving us misaligned buffers - so we still get one 64k allocation per net device. Somehow I think we're never going to completely get rid of the requirement to allocate a bounce buffer, but at the very least we need to stop memcpying every single RX packet. I'll do some profiling to see which is the least worst option. |
#2599 Is this related? |
There's a difference in nomenclature here - from the point of view of the USB hardware, it needs the start of a read or write buffer to be 32-bit aligned. I'm not tracking CPU alignment faults, but there's potentially knobs in the USB-ethernet drivers for changing the offset of packet data start relative to the buffer data start if changing the URB alignment impacts the frequency of CPU faults. |
I've done various trials monitoring how often an unaligned URB is passed to dwc_otg. Between 4.14 and 4.19, usb-ethernet RX buffers became "unaligned" for ASIX and cdc_ether drivers, but unchanged for smsc95xx. In both 4.14 and 4.19 tx buffers are unaligned in almost all cases, with ASIX being the exception for ipv4 packets. Setting NET_IP_ALIGN=0 reverts 4.19 to the old behaviour, as in no unaligned RX buffers. However, with the misalignment of IP headers we now get a lot more faults being handled - I believe one for every ipv6 packet received. It also looks like for ASIX and cdc_ether, the tx header must immediately precede the data so there's no scope for adding a padding fudge factor. The smsc95xx and lan78xx drivers need further investigation. |
smsc95xx can have a variable start offset of up to 3 bytes between the 2 tx words and the start of data, specified in tx_cmd_a[17:16] - so we can mangle TX buffers presented to dwc_otg to be aligned to a word boundary. In lan78xx, there's multiple undefined bitfields in the TX command words that could be start-of-data offset fields. Assuming we can fixup both of these drivers, the default Ethernet interfaces should no longer require bounce buffers. For an immediate "make 4.19 be less broken than 4.14" patch, I propose expanding the atomic allocation pool to 512K and moving the allocation code in dwc_otg up several levels so that USB device drivers get told about failure. |
dwc_otg allocates DMA-coherent buffers in atomic context for misaligned transfer buffers. The pool that these allocations come from is set up at boot-time but can be overridden by a commandline parameter - increase this for now to prevent failures seen on 4.19 with multiple USB Ethernet devices. see: raspberrypi#2924
dwc_otg allocates DMA-coherent buffers in atomic context for misaligned transfer buffers. The pool that these allocations come from is set up at boot-time but can be overridden by a commandline parameter - increase this for now to prevent failures seen on 4.19 with multiple USB Ethernet devices. see: raspberrypi#2924
dwc_otg allocates DMA-coherent buffers in atomic context for misaligned transfer buffers. The pool that these allocations come from is set up at boot-time but can be overridden by a commandline parameter - increase this for now to prevent failures seen on 4.19 with multiple USB Ethernet devices. see: #2924
dwc_otg requires a 32-bit aligned buffer start address, otherwise expensive bounce buffers are used. The LAN951x hardware can skip up to 3 bytes between the TX header and the start of frame data, which can be used to force alignment of the URB passed to dwc_otg. As found in raspberrypi#2924
dwc_otg requires a 32-bit aligned buffer start address, otherwise expensive bounce buffers are used. The LAN951x hardware can skip up to 3 bytes between the TX header and the start of frame data, which can be used to force alignment of the URB passed to dwc_otg. As found in raspberrypi#2924
dwc_otg requires a 32-bit aligned buffer start address, otherwise expensive bounce buffers are used. The LAN951x hardware can skip up to 3 bytes between the TX header and the start of frame data, which can be used to force alignment of the URB passed to dwc_otg. As found in #2924
See: raspberrypi/linux#2952 kernel: dts: Increase default coherent pool size See: raspberrypi/linux#2924
See: raspberrypi/linux#2952 kernel: dts: Increase default coherent pool size See: raspberrypi/linux#2924
Please retest with latest rpi-update firmware. |
It doesn't work with latest kernel 4.19.37-v7+ (hash 18e0a0f9a31e7a3a47d9c4301c7705b980ab0516) and without specifying the Output from "dmesg":
|
I'm not surprised it fails:
The fix was to modify the dtb to include the necessary command line parameter, but clearly for some reason that hasn't worked. |
Similar here. With coherent_pool it works, without I get this: [ 36.289695] ERROR::assign_and_init_hc:1390: assign_and_init_hc: Failed to allocate memory to handle non-dword aligned buffer case [ 203.127427] ------------[ cut here ]------------ My cmdline is: console=ttyAMA0,115200n8 console=tty1 root=/dev/sda3 rootfstype=btrfs rootwait init=/usr/lib/systemd/systemd usbcore.autosuspend=-1 coherent_pool=4M |
I tested my previous setup with a Raspberry Pi 4 B. The two slave Raspberry Pi Zero W are connected to the USB 3.0 ports as USB Ethernet gadgets. Currently I've got two issues: First issue: At least after the first boot it appears that sometimes (?) the USB Ethernet devices are not recognized. In this case they don't show up in the output of the Second issue: When the USB Ethernet devices are recognized then I can only ping one of them. When pinging the other one I get a "Destination Host Unreachable" error and Output from "dmesg" in case the USB Ethernet devices are not recognized:
Output from "dmesg" in case the USB Ethernet devices are recognized and pinging the second one fails:
|
dwc_otg allocates DMA-coherent buffers in atomic context for misaligned transfer buffers. The pool that these allocations come from is set up at boot-time but can be overridden by a commandline parameter - increase this for now to prevent failures seen on 4.19 with multiple USB Ethernet devices. see: #2924
See: #2924 Signed-off-by: Phil Elwell <phil@raspberrypi.org>
dwc_otg allocates DMA-coherent buffers in atomic context for misaligned transfer buffers. The pool that these allocations come from is set up at boot-time but can be overridden by a commandline parameter - increase this for now to prevent failures seen on 4.19 with multiple USB Ethernet devices. see: #2924
See: #2924 Signed-off-by: Phil Elwell <phil@raspberrypi.org>
dwc_otg allocates DMA-coherent buffers in atomic context for misaligned transfer buffers. The pool that these allocations come from is set up at boot-time but can be overridden by a commandline parameter - increase this for now to prevent failures seen on 4.19 with multiple USB Ethernet devices. see: raspberrypi/linux#2924
See: raspberrypi/linux#2924 Signed-off-by: Phil Elwell <phil@raspberrypi.org>
dwc_otg allocates DMA-coherent buffers in atomic context for misaligned transfer buffers. The pool that these allocations come from is set up at boot-time but can be overridden by a commandline parameter - increase this for now to prevent failures seen on 4.19 with multiple USB Ethernet devices. see: #2924
See: #2924 Signed-off-by: Phil Elwell <phil@raspberrypi.org>
dwc_otg allocates DMA-coherent buffers in atomic context for misaligned transfer buffers. The pool that these allocations come from is set up at boot-time but can be overridden by a commandline parameter - increase this for now to prevent failures seen on 4.19 with multiple USB Ethernet devices. see: #2924
See: #2924 Signed-off-by: Phil Elwell <phil@raspberrypi.org>
dwc_otg allocates DMA-coherent buffers in atomic context for misaligned transfer buffers. The pool that these allocations come from is set up at boot-time but can be overridden by a commandline parameter - increase this for now to prevent failures seen on 4.19 with multiple USB Ethernet devices. see: raspberrypi/linux#2924 Signed-off-by: Michael Scott <mike@foundries.io>
See: raspberrypi/linux#2924 Signed-off-by: Phil Elwell <phil@raspberrypi.org> Signed-off-by: Michael Scott <mike@foundries.io>
BugLink: https://bugs.launchpad.net/bugs/1831219 BugLink: https://bugs.launchpad.net/bugs/1825235 dwc_otg allocates DMA-coherent buffers in atomic context for misaligned transfer buffers. The pool that these allocations come from is set up at boot-time but can be overridden by a commandline parameter - increase this for now to prevent failures seen on 4.19 with multiple USB Ethernet devices. see: raspberrypi/linux#2924 (cherry picked from commit 28ef4e7747ce21fedcb59033dc60d560ee04dbc8 https://github.com/raspberrypi/linux rpi-5.0.y) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1831219 BugLink: https://bugs.launchpad.net/bugs/1825235 See: raspberrypi/linux#2924 Signed-off-by: Phil Elwell <phil@raspberrypi.org> (cherry picked from commit 3dd83758048c9e812452636ca5a8144bcf2c9c7f https://github.com/raspberrypi/linux rpi-5.0.y) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1831219 BugLink: https://bugs.launchpad.net/bugs/1825235 dwc_otg allocates DMA-coherent buffers in atomic context for misaligned transfer buffers. The pool that these allocations come from is set up at boot-time but can be overridden by a commandline parameter - increase this for now to prevent failures seen on 4.19 with multiple USB Ethernet devices. see: raspberrypi/linux#2924 (cherry picked from commit 28ef4e7747ce21fedcb59033dc60d560ee04dbc8 https://github.com/raspberrypi/linux rpi-5.0.y) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1831219 BugLink: https://bugs.launchpad.net/bugs/1825235 See: raspberrypi/linux#2924 Signed-off-by: Phil Elwell <phil@raspberrypi.org> (cherry picked from commit 3dd83758048c9e812452636ca5a8144bcf2c9c7f https://github.com/raspberrypi/linux rpi-5.0.y) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
dwc_otg allocates DMA-coherent buffers in atomic context for misaligned transfer buffers. The pool that these allocations come from is set up at boot-time but can be overridden by a commandline parameter - increase this for now to prevent failures seen on 4.19 with multiple USB Ethernet devices. see: #2924
See: #2924 Signed-off-by: Phil Elwell <phil@raspberrypi.org>
dwc_otg allocates DMA-coherent buffers in atomic context for misaligned transfer buffers. The pool that these allocations come from is set up at boot-time but can be overridden by a commandline parameter - increase this for now to prevent failures seen on 4.19 with multiple USB Ethernet devices. see: #2924
See: #2924 Signed-off-by: Phil Elwell <phil@raspberrypi.org>
This issue will be closed within 30 days unless further interactions are posted. If you wish this issue to remain open, please add a comment. A closed issue may be reopened if requested. |
commit 61e713b upstream. Christof Meerwald <cmeerw@cmeerw.org> writes: > Hi, > > this is probably related to commit > 7a0cf09 (signal: Correct namespace > fixups of si_pid and si_uid). > > With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a > properly set si_pid field - this seems to happen for multi-threaded > child processes. > > A simple test program (based on the sample from the signalfd man page): > > #include <sys/signalfd.h> > #include <signal.h> > #include <unistd.h> > #include <spawn.h> > #include <stdlib.h> > #include <stdio.h> > > #define handle_error(msg) \ > do { perror(msg); exit(EXIT_FAILURE); } while (0) > > int main(int argc, char *argv[]) > { > sigset_t mask; > int sfd; > struct signalfd_siginfo fdsi; > ssize_t s; > > sigemptyset(&mask); > sigaddset(&mask, SIGCHLD); > > if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1) > handle_error("sigprocmask"); > > pid_t chldpid; > char *chldargv[] = { "./sfdclient", NULL }; > posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL); > > sfd = signalfd(-1, &mask, 0); > if (sfd == -1) > handle_error("signalfd"); > > for (;;) { > s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo)); > if (s != sizeof(struct signalfd_siginfo)) > handle_error("read"); > > if (fdsi.ssi_signo == SIGCHLD) { > printf("Got SIGCHLD %d %d %d %d\n", > fdsi.ssi_status, fdsi.ssi_code, > fdsi.ssi_uid, fdsi.ssi_pid); > return 0; > } else { > printf("Read unexpected signal\n"); > } > } > } > > > and a multi-threaded client to test with: > > #include <unistd.h> > #include <pthread.h> > > void *f(void *arg) > { > sleep(100); > } > > int main() > { > pthread_t t[8]; > > for (int i = 0; i != 8; ++i) > { > pthread_create(&t[i], NULL, f, NULL); > } > } > > I tried to do a bit of debugging and what seems to be happening is > that > > /* From an ancestor pid namespace? */ > if (!task_pid_nr_ns(current, task_active_pid_ns(t))) { > > fails inside task_pid_nr_ns because the check for "pid_alive" fails. > > This code seems to be called from do_notify_parent and there we > actually have "tsk != current" (I am assuming both are threads of the > current process?) I instrumented the code with a warning and received the following backtrace: > WARNING: CPU: 0 PID: 777 at kernel/pid.c:501 __task_pid_nr_ns.cold.6+0xc/0x15 > Modules linked in: > CPU: 0 PID: 777 Comm: sfdclient Not tainted 5.7.0-rc1userns+ raspberrypi#2924 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > RIP: 0010:__task_pid_nr_ns.cold.6+0xc/0x15 > Code: ff 66 90 48 83 ec 08 89 7c 24 04 48 8d 7e 08 48 8d 74 24 04 e8 9a b6 44 00 48 83 c4 08 c3 48 c7 c7 59 9f ac 82 e8 c2 c4 04 00 <0f> 0b e9 3fd > RSP: 0018:ffffc9000042fbf8 EFLAGS: 00010046 > RAX: 000000000000000c RBX: 0000000000000000 RCX: ffffc9000042faf4 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81193d29 > RBP: ffffc9000042fc18 R08: 0000000000000000 R09: 0000000000000001 > R10: 000000100f938416 R11: 0000000000000309 R12: ffff8880b941c140 > R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880b941c140 > FS: 0000000000000000(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f2e8c0a32e0 CR3: 0000000002e10000 CR4: 00000000000006f0 > Call Trace: > send_signal+0x1c8/0x310 > do_notify_parent+0x50f/0x550 > release_task.part.21+0x4fd/0x620 > do_exit+0x6f6/0xaf0 > do_group_exit+0x42/0xb0 > get_signal+0x13b/0xbb0 > do_signal+0x2b/0x670 > ? __audit_syscall_exit+0x24d/0x2b0 > ? rcu_read_lock_sched_held+0x4d/0x60 > ? kfree+0x24c/0x2b0 > do_syscall_64+0x176/0x640 > ? trace_hardirqs_off_thunk+0x1a/0x1c > entry_SYSCALL_64_after_hwframe+0x49/0xb3 The immediate problem is as Christof noticed that "pid_alive(current) == false". This happens because do_notify_parent is called from the last thread to exit in a process after that thread has been reaped. The bigger issue is that do_notify_parent can be called from any process that manages to wait on a thread of a multi-threaded process from wait_task_zombie. So any logic based upon current for do_notify_parent is just nonsense, as current can be pretty much anything. So change do_notify_parent to call __send_signal directly. Inspecting the code it appears this problem has existed since the pid namespace support started handling this case in 2.6.30. This fix only backports to 7a0cf09 ("signal: Correct namespace fixups of si_pid and si_uid") where the problem logic was moved out of __send_signal and into send_signal. Cc: stable@vger.kernel.org Fixes: 6588c1e ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary") Ref: 921cf9f ("signals: protect cinit from unblocked SIG_DFL signals") Link: https://lore.kernel.org/lkml/20200419201336.GI22017@edge.cmeerw.net/ Reported-by: Christof Meerwald <cmeerw@cmeerw.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61e713b upstream. Christof Meerwald <cmeerw@cmeerw.org> writes: > Hi, > > this is probably related to commit > 7a0cf09 (signal: Correct namespace > fixups of si_pid and si_uid). > > With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a > properly set si_pid field - this seems to happen for multi-threaded > child processes. > > A simple test program (based on the sample from the signalfd man page): > > #include <sys/signalfd.h> > #include <signal.h> > #include <unistd.h> > #include <spawn.h> > #include <stdlib.h> > #include <stdio.h> > > #define handle_error(msg) \ > do { perror(msg); exit(EXIT_FAILURE); } while (0) > > int main(int argc, char *argv[]) > { > sigset_t mask; > int sfd; > struct signalfd_siginfo fdsi; > ssize_t s; > > sigemptyset(&mask); > sigaddset(&mask, SIGCHLD); > > if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1) > handle_error("sigprocmask"); > > pid_t chldpid; > char *chldargv[] = { "./sfdclient", NULL }; > posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL); > > sfd = signalfd(-1, &mask, 0); > if (sfd == -1) > handle_error("signalfd"); > > for (;;) { > s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo)); > if (s != sizeof(struct signalfd_siginfo)) > handle_error("read"); > > if (fdsi.ssi_signo == SIGCHLD) { > printf("Got SIGCHLD %d %d %d %d\n", > fdsi.ssi_status, fdsi.ssi_code, > fdsi.ssi_uid, fdsi.ssi_pid); > return 0; > } else { > printf("Read unexpected signal\n"); > } > } > } > > > and a multi-threaded client to test with: > > #include <unistd.h> > #include <pthread.h> > > void *f(void *arg) > { > sleep(100); > } > > int main() > { > pthread_t t[8]; > > for (int i = 0; i != 8; ++i) > { > pthread_create(&t[i], NULL, f, NULL); > } > } > > I tried to do a bit of debugging and what seems to be happening is > that > > /* From an ancestor pid namespace? */ > if (!task_pid_nr_ns(current, task_active_pid_ns(t))) { > > fails inside task_pid_nr_ns because the check for "pid_alive" fails. > > This code seems to be called from do_notify_parent and there we > actually have "tsk != current" (I am assuming both are threads of the > current process?) I instrumented the code with a warning and received the following backtrace: > WARNING: CPU: 0 PID: 777 at kernel/pid.c:501 __task_pid_nr_ns.cold.6+0xc/0x15 > Modules linked in: > CPU: 0 PID: 777 Comm: sfdclient Not tainted 5.7.0-rc1userns+ #2924 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > RIP: 0010:__task_pid_nr_ns.cold.6+0xc/0x15 > Code: ff 66 90 48 83 ec 08 89 7c 24 04 48 8d 7e 08 48 8d 74 24 04 e8 9a b6 44 00 48 83 c4 08 c3 48 c7 c7 59 9f ac 82 e8 c2 c4 04 00 <0f> 0b e9 3fd > RSP: 0018:ffffc9000042fbf8 EFLAGS: 00010046 > RAX: 000000000000000c RBX: 0000000000000000 RCX: ffffc9000042faf4 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81193d29 > RBP: ffffc9000042fc18 R08: 0000000000000000 R09: 0000000000000001 > R10: 000000100f938416 R11: 0000000000000309 R12: ffff8880b941c140 > R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880b941c140 > FS: 0000000000000000(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f2e8c0a32e0 CR3: 0000000002e10000 CR4: 00000000000006f0 > Call Trace: > send_signal+0x1c8/0x310 > do_notify_parent+0x50f/0x550 > release_task.part.21+0x4fd/0x620 > do_exit+0x6f6/0xaf0 > do_group_exit+0x42/0xb0 > get_signal+0x13b/0xbb0 > do_signal+0x2b/0x670 > ? __audit_syscall_exit+0x24d/0x2b0 > ? rcu_read_lock_sched_held+0x4d/0x60 > ? kfree+0x24c/0x2b0 > do_syscall_64+0x176/0x640 > ? trace_hardirqs_off_thunk+0x1a/0x1c > entry_SYSCALL_64_after_hwframe+0x49/0xb3 The immediate problem is as Christof noticed that "pid_alive(current) == false". This happens because do_notify_parent is called from the last thread to exit in a process after that thread has been reaped. The bigger issue is that do_notify_parent can be called from any process that manages to wait on a thread of a multi-threaded process from wait_task_zombie. So any logic based upon current for do_notify_parent is just nonsense, as current can be pretty much anything. So change do_notify_parent to call __send_signal directly. Inspecting the code it appears this problem has existed since the pid namespace support started handling this case in 2.6.30. This fix only backports to 7a0cf09 ("signal: Correct namespace fixups of si_pid and si_uid") where the problem logic was moved out of __send_signal and into send_signal. Cc: stable@vger.kernel.org Fixes: 6588c1e ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary") Ref: 921cf9f ("signals: protect cinit from unblocked SIG_DFL signals") Link: https://lore.kernel.org/lkml/20200419201336.GI22017@edge.cmeerw.net/ Reported-by: Christof Meerwald <cmeerw@cmeerw.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I'm seeing this issue present itself again as well in kernel 5.4.58. Using an AX88179 and RTL8153. Only see this issue on Pi 3B+, Pi 4 is fine.
I can get a DHCP IP address on one interface (in this case, the RTL8153 device), but the other does not send or receive any packets. Configuring a manual/static address does not work either. Eventually, the Pi will freeze and require a reboot. |
Have you tried with |
Hi, I managed to get it working by adding Also, I am aware that I won't see gigabit speeds... I'll be using significantly less bandwidth ;) |
No issues with kernel-4.14.98.
Tested setup:
AX88179 Gigabit Ethernet
To trigger the kernel crash:
not yet connected to USB adapter
broad up
Immediate dmesg message after plugging the ethernet cable:
[ 32.537340] ERROR::assign_and_init_hc:1394: assign_and_init_hc: Failed to allocate memory to handle non-dword aligned buffer case
Kernel crash some seconds later:
[ 78.078870] ------------[ cut here ]------------
[ 78.083672] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x298/0x29c
[ 78.092196] NETDEV WATCHDOG: eth_inet (ax88179_178a): transmit queue 0 timed out
[ 78.099835] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.32-v7+ #37
[ 78.106456] Hardware name: BCM2835
[ 78.109977] [<80110070>] (unwind_backtrace) from [<8010c82c>] (show_stack+0x20/0x24)
[ 78.117941] [<8010c82c>] (show_stack) from [<80ac87a8>] (dump_stack+0xcc/0x110)
[ 78.125399] [<80ac87a8>] (dump_stack) from [<80127898>] (__warn.part.3+0xc8/0xe4)
[ 78.133033] [<80127898>] (__warn.part.3) from [<8012791c>] (warn_slowpath_fmt+0x68/0x70)
[ 78.141286] [<8012791c>] (warn_slowpath_fmt) from [<809260bc>] (dev_watchdog+0x298/0x29c)
[ 78.152419] [<809260bc>] (dev_watchdog) from [<801995c0>] (call_timer_fn+0x3c/0x1a4)
[ 78.166200] [<801995c0>] (call_timer_fn) from [<80199820>] (expire_timers+0xf8/0x168)
[ 78.180364] [<80199820>] (expire_timers) from [<80199934>] (run_timer_softirq+0xa4/0x1c0)
[ 78.194954] [<80199934>] (run_timer_softirq) from [<80102370>] (__do_softirq+0x188/0x410)
[ 78.209607] [<80102370>] (__do_softirq) from [<8012d3f0>] (irq_exit+0xf8/0x134)
[ 78.220293] [<8012d3f0>] (irq_exit) from [<80183618>] (__handle_domain_irq+0x70/0xc4)
[ 78.234765] [<80183618>] (__handle_domain_irq) from [<8010219c>] (bcm2836_arm_irqchip_handle_irq+0x60/0xa8)
[ 78.251219] [<8010219c>] (bcm2836_arm_irqchip_handle_irq) from [<801019bc>] (__irq_svc+0x5c/0x7c)
[ 78.266635] Exception stack(0x81001ee8 to 0x81001f30)
[ 78.274964] 1ee0: 801092a4 00000000 40000093 40000093 ffffe000 8100616c
[ 78.289630] 1f00: 810061b4 00000001 00000001 8109cd8d 80cc1a74 81001f44 81000000 81001f38
[ 78.304380] 1f20: 00000000 801092a8 40000013 ffffffff
[ 78.312744] [<801019bc>] (__irq_svc) from [<801092a8>] (arch_cpu_idle+0x34/0x4c)
[ 78.326558] [<801092a8>] (arch_cpu_idle) from [<80ae3000>] (default_idle_call+0x40/0x48)
[ 78.341068] [<80ae3000>] (default_idle_call) from [<80157144>] (do_idle+0x134/0x174)
[ 78.355248] [<80157144>] (do_idle) from [<80157420>] (cpu_startup_entry+0x28/0x2c)
[ 78.369266] [<80157420>] (cpu_startup_entry) from [<80adcc34>] (rest_init+0xb8/0xbc)
[ 78.383454] [<80adcc34>] (rest_init) from [<80f01020>] (start_kernel+0x3b4/0x3c8)
[ 78.397396] ---[ end trace 19ce283ffed865af ]---
[ 78.405259] ERROR::assign_and_init_hc:1394: assign_and_init_hc: Failed to allocate memory to handle non-dword aligned buffer case
[ 78.405259]
[ 78.428274] ERROR::assign_and_init_hc:1394: assign_and_init_hc: Failed to allocate memory to handle non-dword aligned buffer case
The text was updated successfully, but these errors were encountered: