-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPIO UART(S) dropping characters #1017
Comments
The fact that this seems to affect both UARTs equally makes me suspect that this isn't a UART hardware or driver problem. Remember that both UARTs have been extensively tested as part of the Bluetooth support on the WiFi Pis - I found and fixed at least two bugs in the PL011/ttyAMA0 driver. If you drop the GPU RAM to 16MB you will be selecting the
Start your testing with the best UART - the PL011 that appears as ttyAMA0. On a WiFi Pi you will need to first relieve it of its Bluetooth duties by adding You can also narrow down the contributing factors by running the full Raspbian in console mode - see the Boot options in the |
I've tried the dtoverlay=pi3-miniuart-bt (other uart) as well, no change. I've ran full Raspbian in console mode as well, no change. The transmitter was originally an atmel mcu.. but just to take that out of the loop, I've since been using just a plain old jumper on the TX/RX lines and watching the characters come back. The original application was using python, but since using the loopback echo I've been using minicom and screen to access the port, both similar results. Here are some tests I completed just pasting a line and watching it echo back.. the number of characters lost is very predictable, but curiously changes between using minicom and screen. I can tell you I could not get any character loss at 2400! Here are the results. Open it with a fixed font editor, the first line of each test is the original line sent followed by lines where characters were missing. Note that I have swapped between a 3 and 3+ RPI as well, no change. I've swapped multiple power supplies, even went as far as unplugging all USB and even the HDMI screen, in case something was interfering. The few things I've been able to narrow down are that I've never seen an issue with any version of STRETCH LITE OR any FULL STRETCH version prior to 2018-04-18-raspbian-stretch. It all seems to start with the full version of 2018-04-18-raspbian-stretch. I actually ran through and tested ALL FULL versions of STRETCH from the first 2017-08-16 till current. Also, an older version that does not have the problem, starts having the problem as soon as an update is done. It still could be my setup, sure, but I'm running out of things to test. Thank you for any help you can provide. |
An update.. on the newest kernel I was able to get the full uart to work reliably by adding.. gpu_mem=8 to /boot/config.txt Changing GPU memory to 64 but keeping the 600 arm_freq and arm_freq_min caused it to fail again. I switched back and forth between these configs several times to confirm the results. This is on a 3+. detailed results... Linux raspberrypi 4.14.52-v7+ #1123 SMP Wed Jun 27 17:35:49 BST 2018 armv7l GNU/Linux #NOT dropping characters, using full uart and locking arm_freq, 8 MEG GPU #dropping characters, using miniuart and locking arm_freq, 8 MEG GPU #dropping characters, using full uart but free arm_freq, 8 MEG GPU #dropping characters, using full uart, locked arm_freq, but 64MEG on GPU |
You didn't answer my question about flow control, but I suspect the answer would be no. I spend quite a bit of time controlling a Pi (usually a 3 or 3+) via the UART at 115200 baud, and I haven't noticed any corruption, so I constructed a text file containing many thousands of lines of your test string and, running in a shell over the UART I typed:
then pasted the text from my PC's clipboard. Once it had finished dribbling across I analysed the results:
i.e. all 21760 lines were received correctly. Next (OK - if I'm honest, the second time around, after it failed the first time) I disabled the console on ttyAMA0. Then I connected a patch cable from TX to RX. In one GUI terminal window I ran:
and in another I ran:
After the second, outbound cat completed I stopped the first with Ctrl-C. Comparing the two files I found they were identical:
Repeating the test at 921600 baud resulted in a single cluster of 8 dropped bytes, which isn't very surprising without flow control. At this point I suggest you repeat my tests with |
Correct, no flow control was used. I repeated with your cat tests.. I still see dropped characters. I even re-imaged a disk with a brand new 2018-06-27-raspbian-stretch and repeated, making only minimal changes. I repeated it again with the wifi off, in that case that changed anything, similar results. 115200 as you did above. You are using a full desktop release of Raspbian, correct? Because I don't have any issues with any version of the lite non-desktop versions, I repeated your test on a stretch lite copy as well, results were identical files with no dropped characters. ======================================================= enable_uart=1 pi@raspberrypi: =================================== brand new image of 2018-06-27 Stretch full enable_uart=1 pi@raspberrypi: =================================== |
3B+, WiFi enabled, full clean Raspbian 2018-06-27, booted to desktop, same modifications as you, 115200 and 921600 - all give correct results. If I stress it enough it will drop data - that's what flow control is for - but I don't see anything untoward. |
I'm seeing this too, sending even one byte more than the hardware buffer size (8/16 bytes) to the Pi seems to be hit/miss with current Raspbian with desktop kernel. |
You may well be seeing a problem, but there is no such thing as a "desktop kernel" - all configurations of Raspbian use the same kernel (the one that matches the hardware). If you have a test script that provokes the data loss then please post it here. |
Any lite version has no problems. I've tried everything with my hardware I can imagine, 3 power supplies, 2 different Pi, USB boot, SD card boot, removing HDMI and all USB peripherals, turning off wifi. I'm running out of ideas on my own. I have no script. I repeated the tests with your simple cat commands on a brand new 2018-06-27-raspbian-stretch with minimal changes as stated in my comment above, that is as close to a repeatable/simple test I have done. I guess if is there is nothing else to try we just wait and see if anyone else reports anything. We got one more so far (burtyb). Thanks! |
Have you tried switching to powersave mode to make sure the clocks are not causing a problem? just need to: echo "powersave" | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor |
Gordon, I just tried the powersave, no change on my side. Thanks for the suggestion. Regards, |
OK, Can you list exactly the minimum sequence of steps you need to reproduce the problem assuming you just loop back the UART on the GPIO connector? List which pins you connect, anything you change in config.txt from the clean Raspbian download (including a link). Are you using stty to set the baud rate and parameters? Gordon |
OS: 2018-06-27-raspbian-stretch.img created a file named foo.txt with a 20,000 or so lines of this.. IN ONE TERMINAL RUN... IN ANOTHER TERMINAL RUN... the following works good for seeing bad lines.. Hopefully you guys can confirm this.. otherwise I'm packing up, driving 100 miles east and trying this again. Regards, |
Is that the minimum set of changes required to reproduce the problem? Do you really need to change gpu_mem to trigger the problem? |
The dtoverlay line switches Bluetooth to the mini-UART, making the more capable UART available for other applications. |
Although I don't think that's necessary to reproduce the problem... |
We want to test ttyAMA0 because it should be the UART least likely to drop data, and we can't test ttyAMA0 if Bluetooth is using it, so using one of the -bt overlays is a necessary step to give this issue the highest priority. |
Gordon, The setup above was my last minimal test, not necessarily the most minimal test. I forced GPU_MEM just to give it a value, stock fails too. I enabled ssh because I use it, I don't believe it has any effect on the results. I've likely tried both a jumper jumper and a jumper wire, I don't remember for specific tests. We started testing the miniuart for whatever reasone.. and we just went with it, they both seem to behave the same. burtyb, Can you tell us more about your issue? Does it match what I'm seeing where the problem goes away with any of the lite versions? Regards, |
Meh, I meant the current kernel on desktop Raspbian but as the stance seems to be use hardware control (where I agree it works) I'll just remove the connector from future revisions of my HATs. |
We're not saying there isn't a problem here, but I followed the instructions and didn't see it myself. Hardware flow control is the only sure way to get reliable transfers over a UART, although the -rt kernel may do better under moderate load. |
Full Raspbian does install extra daemons that also run in console mode, such as bluealsa. |
I have the same issue, run at 921k, USB can do it very well, but AMA0 will lose data. |
I also have this issue. Shorten the GPIO14/15 with a jumper on a RPI3 and used this serial test tool: https://github.com/cbrake/linux-serial-test ./linux-serial-test -s -e -p /dev/ttyAMA0 -b 921600 |
This is a very high error rate. At 921600 baud with no flow control I get an error (usually a cluster of 8 dropped bytes) every few million iterations on 3B+ and 4B (3B should be no different). Are you sure you don't have "console=ttyAMA0" or "console=serial0" in /boot/cmdline.txt? |
I have noticed that the error rate increases when the CPU gets hot, but the mechanism is currently unknown. |
Locking the ARM clock at a lower speed:
has a greater effect on the error rate, so the increase in errors/dropped bytes appears to be caused by a reduced ARM speed rather then an increased temperature. |
The output has a very stable baudrate, and there are no data corruptions, so any losses are in the emptying (or not) of the RX FIFO. |
Anyone would like to try a real-time kernel? is the thermal throttling producing the problem? Suddenly dropping the CPU speed causes no enough time to transfer data out from FIFO? - lose data... Looking at the errors from the above, something we need to note: Error, count: 3931, expected 63, got 6b 63 0110 0011 Error, count: 73746, expected 22, got 23 22 0010 0010 ........ Data was captured by the IO, but a lot of bit flaps. I am interested in to capture the data line waveform to do a further analysis : ) |
The error turns out to be that data is being written into a full TX FIFO, causing it to be dropped. There is some optimising logic in there that assumes, not unreasonably, that it is always safe to write into the FIFO after the TX interrupt has fired. For some reason this appears to not be the case. The simple fix is to disable the optimisation and check the FIFO level before every write - I've left a board running this test overnight to see how effective it is - but we ought to be able to do better. |
The overnight test lost 16 bytes (or possibly pairs of bytes) in total out of nearly 5 billion bytes sent, so writing to a full FIFO is definitely the cause of the data loss. This morning I found the failure mechanism - the RX interrupt handler releases the lock, allowing a transmit thread on another core to jump in and fill the FIFO before the TX interrupt handler tries to write its half-a-FIFO of data (assuming there is any left to write). A fix has been pushed to rpi-4.19.y - you can read the details here: raspberrypi/linux@9bf5cd2 |
kernel: i2c: bcm2835: Set clock-stretch timeout to 35ms See: raspberrypi/linux#3064 kernel: xhci: add quirk for host controllers that don't update endpoint DCS See: raspberrypi/linux#3060 kernel: tty: amba-pl011: Make TX optimisation conditional See: #1017 kernel: overlays: Add real parameters to the rpi-poe overlay kernel: overlays: Correct gpio-fan gpio flags for 4.19 See: raspberrypi/linux#2715 kernel: overlays: i2c-gpio: Fix the bus parameter See: raspberrypi/linux#3062 kernel: overlays: Rename pi3- overlays to be less model-specific See: raspberrypi/linux#3052 firmware: dispmanx: Fix handling of disable_overscan to not disable it totally See: raspberrypi/linux#3059 firmware: power: Enable/disable H264 and ISP clocks with domain firmware: arm_loader: arm_64bit=0 should disable loading of kernel8.img firmware: dt-blob: CM has no activity LED
kernel: i2c: bcm2835: Set clock-stretch timeout to 35ms See: raspberrypi/linux#3064 kernel: xhci: add quirk for host controllers that don't update endpoint DCS See: raspberrypi/linux#3060 kernel: tty: amba-pl011: Make TX optimisation conditional See: raspberrypi/firmware#1017 kernel: overlays: Add real parameters to the rpi-poe overlay kernel: overlays: Correct gpio-fan gpio flags for 4.19 See: raspberrypi/linux#2715 kernel: overlays: i2c-gpio: Fix the bus parameter See: raspberrypi/linux#3062 kernel: overlays: Rename pi3- overlays to be less model-specific See: raspberrypi/linux#3052 firmware: dispmanx: Fix handling of disable_overscan to not disable it totally See: raspberrypi/linux#3059 firmware: power: Enable/disable H264 and ISP clocks with domain firmware: arm_loader: arm_64bit=0 should disable loading of kernel8.img firmware: dt-blob: CM has no activity LED
I ran into RX FIFO overruns recently @ 460800 baud, pumping about 16 kByte/s in bursts of 16..23 bytes. Using ioctl(mcu_uart, TIOCGICOUNT, &icount); .. I noticed the overruns. But this only happened on 4.19.58 with RT-PREEMPT. |
Later versions of Raspbian (around 4.14.34-v7+ and beyond) seem to drop received characters from the serial uart, approx 4 characters every second or two @ 19200 baud. LITE version has NO PROBLEMS, problems show up only on full desktop versions. Dropping to 16 meg of GPU memory or below seems to solve the problem as well. More details at this post:
https://www.raspberrypi.org/forums/viewtopic.php?t=217702
Thank you!
The text was updated successfully, but these errors were encountered: