Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel panic during uart overrun #3226

Open
ashley-b opened this issue Sep 12, 2019 · 8 comments
Open

Kernel panic during uart overrun #3226

ashley-b opened this issue Sep 12, 2019 · 8 comments

Comments

@ashley-b
Copy link

Description
I am seeing a kernel panic consistently when operating the serial port at 1000000 baud and with no flow control using /dev/ttyAMA0 (PL011 device), with a data usage of greater 60%.
As far as I can tell it occurs when I do not read the data from user space quickly enough.

Expected behaviour
I would expected data to be dropped and hardware counters to show this overrun. eg
cat /proc/tty/driver/ttyAMA
0: uart:PL011 rev2 mmio:0x3F201000 irq:81 tx:0 rx:121501502 brk:4 oe:15 bo:119400638 RTS|CTS|DTR

System
I have seen this happening on a raspberry Pi2B and Pi3B+

cat /etc/rpi-issue

Raspberry Pi reference 2019-07-10
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 175dfb027ffabd4b8d5080097af0e51ed9a4a56c, stage2

vcgencmd version

Aug 15 2019 12:06:42 
Copyright (c) 2012 Broadcom
version 0e6daa5106dd4164474616408e0dc24f997ffcf3 (clean) (release) (start)

uname -a

Linux raspberrypi 4.19.66-v7+ #1253 SMP Thu Aug 15 11:49:46 BST 2019 armv7l GNU/Linux

43.508528] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 43.512488] pgd = 4bef5981
[ 43.514451] [00000000] *pgd=00000000
[ 43.516421] Internal error: Oops: 80000007 [#1] SMP ARM
[ 43.518405] Modules linked in: sha256_generic cfg80211 rfkill 8021q garp stp llc bcm2835_codec(C) snd_bcm2835(C) bcm2835_v4l2(C) v4l2_mem2mem snd_pcm bcm2835_mmal_vchiq(C) videobuf2_dma_contig raspberrypi_hwmon v4l2_common videobuf2_vmalloc hwmon videobuf2_memops videobuf2_v4l2 snd_timer videobuf2_common snd videodev media vc_sm_cma(C) uio_pdrv_genirq uio fixed ledtrig_netdev ip_tables x_tables ipv6
[ 43.526397] CPU: 0 PID: 44 Comm: kworker/u8:1 Tainted: G C 4.19.66-v7+ #1253
[ 43.528616] Hardware name: BCM2835
[ 43.529741] Workqueue: events_unbound flush_to_ldisc
[ 43.530882] PC is at (null)
[ 43.532026] LR is at uart_throttle+0x118/0x124
[ 43.533158] pc : [<00000000>] lr : [<805870ac>] psr: 20000013
[ 43.534306] sp : b9fb3de0 ip : 00000001 fp : b9fb3dfc
[ 43.535442] r10: b764f200 r9 : 00000000 r8 : b4c8c88d
[ 43.536572] r7 : b9362400 r6 : b764f200 r5 : b96b1840 r4 : 00000024
[ 43.537719] r3 : 00000000 r2 : 00000004 r1 : 00000000 r0 : b96b1840
[ 43.538836] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 43.539953] Control: 10c5383d Table: 34da406a DAC: 00000055
[ 43.541092] Process kworker/u8:1 (pid: 44, stack limit = 0xbd776f43)
[ 43.542263] Stack: (0xb9fb3de0 to 0xb9fb4000)
[ 43.543427] 3de0: 80586f94 00000000 b764f200 b764f260 b9fb3e1c b9fb3e00 80570468 80586fa0
[ 43.545736] 3e00: 00000079 00000001 bc349000 bc34b000 b9fb3e84 b9fb3e20 8056e1d8 805703f8
[ 43.548036] 3e20: b764f274 80c92ad8 8014adf4 55555556 ba351e40 80830938 80958488 00000000
[ 43.550327] 3e40: bc34b000 bc349000 00000008 b4c8c895 00000000 00000000 80d07fc0 00000008
[ 43.552690] 3e60: 00000000 b4c8c88d b9362400 b4c8c88d b96a5008 00000000 b9fb3e9c b9fb3e88
[ 43.555180] 3e80: 8056e58c 8056db84 00000001 b9fb3e98 b9fb3eb4 b9fb3ea0 805712bc 8056e574
[ 43.557798] 3ea0: 00000008 00000008 b9fb3ed4 b9fb3eb8 80571dc0 80571298 b4c8c800 b96a5000
[ 43.560523] 3ec0: b96a5014 b96a5004 b9fb3efc b9fb3ed8 80571864 80571d84 b9f49000 b96a5004
[ 43.563383] 3ee0: b9c16400 b9c26000 00000000 b96a5008 b9fb3f34 b9fb3f00 8013bf0c 805717b8
[ 43.566296] 3f00: 40000093 b9c16400 b9c16400 b9c16400 b9f49014 b9c16400 b9c16418 80d03d00
[ 43.569277] 3f20: 00000088 b9f49000 b9fb3f7c b9fb3f38 8013c250 8013bda8 b9fb3f5c 00000000
[ 43.572298] 3f40: 80d03d00 80d03d00 80d8ecfa b9fb2038 b9fb3f7c b9d072c0 b9f39700 00000000
[ 43.575316] 3f60: b9f49000 8013c1f4 b9d072dc b9d35e74 b9fb3fac b9fb3f80 8014253c 8013c200
[ 43.578337] 3f80: 80104378 b9f39700 80142404 00000000 00000000 00000000 00000000 00000000
[ 43.581347] 3fa0: 00000000 b9fb3fb0 801010ac 80142410 00000000 00000000 00000000 00000000
[ 43.584356] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 43.587377] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 43.590406] [<805870ac>] (uart_throttle) from [<80570468>] (tty_throttle_safe+0x7c/0x80)
[ 43.593429] [<80570468>] (tty_throttle_safe) from [<8056e1d8>] (n_tty_receive_buf_common+0x660/0x9f0)
[ 43.596474] [<8056e1d8>] (n_tty_receive_buf_common) from [<8056e58c>] (n_tty_receive_buf2+0x24/0x2c)
[ 43.599511] [<8056e58c>] (n_tty_receive_buf2) from [<805712bc>] (tty_ldisc_receive_buf+0x30/0x6c)
[ 43.602562] [<805712bc>] (tty_ldisc_receive_buf) from [<80571dc0>] (tty_port_default_receive_buf+0x48/0x64)
[ 43.605632] [<80571dc0>] (tty_port_default_receive_buf) from [<80571864>] (flush_to_ldisc+0xb8/0xe8)
[ 43.608711] [<80571864>] (flush_to_ldisc) from [<8013bf0c>] (process_one_work+0x170/0x458)
[ 43.611786] [<8013bf0c>] (process_one_work) from [<8013c250>] (worker_thread+0x5c/0x5a4)
[ 43.614854] [<8013c250>] (worker_thread) from [<8014253c>] (kthread+0x138/0x168)
[ 43.617919] [<8014253c>] (kthread) from [<801010ac>] (ret_from_fork+0x14/0x28)
[ 43.619495] Exception stack(0xb9fb3fb0 to 0xb9fb3ff8)
[ 43.621042] 3fa0: 00000000 00000000 00000000 00000000
[ 43.624058] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 43.627035] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 43.628548] Code: bad PC value
[ 43.630064] ---[ end trace f3e49c8e83a2d265 ]---

@t-beckmann
Copy link

I can confirm this issue and found that the NULL dereference happens in https://github.com/raspberrypi/linux/blob/rpi-4.19.y/drivers/tty/serial/serial_core.c#L711 where in calling port->ops->throttle(port) the throttle pointer is NULL.

Adhoc fixing this by skipping the call when the pointer is NULL indeed avoids the kernel panic, but it seems the UART buffers then fill up way too quickly. A assume the effect is this quick fix entirely drops flow control and therefore some other part of the kernel drops data, i.e. individual characters are dropped when reading the serial port.

My assumption is that the uart_port data structures changed in some other place of the kernel and the raspberry pi uart driver code was not updated to reflect that. Thing is, didn't find out where this root cause of the issue actually is...

@pelwell
Copy link
Contributor

pelwell commented Feb 19, 2020

I added throttle/unthrottle methods in the last month - have you tried with the current 4.19 kernel?

@t-beckmann
Copy link

Cheers! Think you are referring to this commit: 63739af

We got a custom build for an embedded device in place and need to apply your changes individually. Will give it a try and get back later on today...

@pelwell
Copy link
Contributor

pelwell commented Feb 19, 2020

Yes, that's the one. It's been soak-tested with random pauses at the sender and receiver and no data loss, so it should be good.

@t-beckmann
Copy link

I can confirm the kernel input overrun no longer gets reported as hardware flow control is engaged by the commit mentioned above.

@wsgcm
Copy link

wsgcm commented Jun 1, 2020

I would also like to use my Raspberry Pi serial port and am getting this same uart_throttle panic, in Linux 4.19.57+, after receiving just 4KB of data on /dev/ttyAMA0.

Is there a simple and clean way to obtain and install the new PL011 driver that solves this problem? Can I simply replace a binary file somewhere, or do I need to install linux kernel tools and write rmmod/insmod scripts and stuff?

I apologize that I don't know everything about the Linux kernel. I have written a few networking drivers, but not made Linux into a life's passion. I'm sure someone would understand this perspective. So I would appreciate some advice on how one installs a driver that is packaged into the system like this.

@6by9
Copy link
Contributor

6by9 commented Jun 1, 2020

Update you kernel to the latest released version.
The problem was originally reported against 4.19.66, therefore it would obviously affect 4.19.57.

Use apt update apt dist-upgrade as it's also been released through the normal Raspbian updates. This will also update all other packages, so you may want to read up on just updating individual packages using apt instead if you want to retain older versions of other packages.

@pelwell
Copy link
Contributor

pelwell commented Jun 1, 2020

The fix went into downstream builds of 4.19.97 and later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants