-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race-condition in spi-bcm2835.c when performing an SPI transfer in interrupt mode #2754
Comments
Lukas Wunner is currently working on a patch series for spi-bcm2835 (upstream), which seems to contain a fix for this issue: |
Thanks, that does indeed look like the same issue. |
Are you prepared to wait until Lukas's patch set is accepted upstream? We can back-port it as soon as it gets merged somewhere. |
Absolutely, there's no hurry on my part. |
@niklasekstrom Btw test feedback about Lukas patch series is always welcome. |
The patch series and some additional clean up by Lukas has been applied to next. |
I'm closing this issue since a solution has been committed: 4de3f63#diff-d640347721b0bd06f8ffd76f755bdf74 |
Hello, |
That patch has been in firmware builds since 4.14.92 on January 9th. Run |
oh great, thank you! |
Run |
dc4 ? I get this, the version is ok, but no dc4 :) [ 0.000000] Linux version 4.14.98-v7+ (dom@dom-XPS-13-9370) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1200 SMP Tue Feb 12 20:27:48 GMT 2019 |
No that's fine - dom=dc4, and the version is new enough, so you're good to go. |
Perfect, thank you very much !! |
While I agree that this was resolved in 4.14.98, the issue seems to have resurfaced in 4.19.66... has anyone else experienced this? |
(I initially reported this bug to the Raspberry pi forum, https://www.raspberrypi.org/forums/viewtopic.php?f=44&t=214648&p=1321017#p1321017, but repeat it here for completeness.)
The bug is present in the spi-bcm2835.c SPI driver for Raspberry PI. Occasionally when the driver is reading data using an interrupt transfer, the last byte, or even more rarely the last couple of bytes, are not transferred from the hardware RX FIFO buffer to the receive buffer in memory by the driver.
To find the problem I did the following, using a Raspberry PI 3 (should work on other models that use the same driver). I have an SPI memory connected to the Raspberry, and I wrote a small test program that repeatedly does the following cycle: write a sequence of bytes of random length to a random address, and then read back that sequence of bytes and verify that the bytes read back are the same as those written. Almost all of the time the byte string is read back correctly, but occasionally the last byte was not correctly read back.
My analysis is as follows. I noticed a pattern in the length of the sequence of bytes that the failing write/read cycle had. The length of a failing transfer was always between 67 bytes and 95 bytes. I looked in the source code for the driver (https://github.com/raspberrypi/linux/blob/rpi-4.14.y/drivers/spi/spi-bcm2835.c) and found that a transfer can be done in one of three modes, depending on the length of the transfer: (1) polling, (2) interrupt, and (3) DMA. The code to determine which mode is used is in bcm2835_spi_transfer_one(); note that the lower boundary of 67 bytes depends on the SPI speed which for me is 20 MHz on a RPi3 (400 MHz core clock). All the transfers that were failing were running in the interrupt mode.
I then looked in the bcm2835_spi_interrupt() interrupt handler, and there is a race-condition. At line 154 (https://github.com/raspberrypi/linux/blob/rpi-4.14.y/drivers/spi/spi-bcm2835.c#L154) bytes are drained from the RX FIFO. Then on line 159 there is a check to see if the transfer is complete (the DONE bit is set in the CS register). However, if the transfer is found to be completed on line 159 there is still a (small) chance that there is data in the RX FIFO that arrived at a time in between when lines 154 and 159 were executed.
Fix: By making the following change to the driver, it works correctly after that:
line 159: if (bcm2835_rd(bs, BCM2835_SPI_CS) & BCM2835_SPI_CS_DONE) { // this line is unchanged
line 160: if (bs->rx_len) // this line is added
line 161: bcm2835_rd_fifo(bs); // this line is added
I will make a patch and submit a pull request (I'm not well-versed in GitHub, so hopefully I do this in the correct order).
I noticed this issue, #2200, that could possibly be caused by the same bug.
The text was updated successfully, but these errors were encountered: