Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race-condition in spi-bcm2835.c when performing an SPI transfer in interrupt mode #2754

Closed
niklasekstrom opened this issue Nov 14, 2018 · 15 comments

Comments

@niklasekstrom
Copy link

(I initially reported this bug to the Raspberry pi forum, https://www.raspberrypi.org/forums/viewtopic.php?f=44&t=214648&p=1321017#p1321017, but repeat it here for completeness.)

The bug is present in the spi-bcm2835.c SPI driver for Raspberry PI. Occasionally when the driver is reading data using an interrupt transfer, the last byte, or even more rarely the last couple of bytes, are not transferred from the hardware RX FIFO buffer to the receive buffer in memory by the driver.

To find the problem I did the following, using a Raspberry PI 3 (should work on other models that use the same driver). I have an SPI memory connected to the Raspberry, and I wrote a small test program that repeatedly does the following cycle: write a sequence of bytes of random length to a random address, and then read back that sequence of bytes and verify that the bytes read back are the same as those written. Almost all of the time the byte string is read back correctly, but occasionally the last byte was not correctly read back.

My analysis is as follows. I noticed a pattern in the length of the sequence of bytes that the failing write/read cycle had. The length of a failing transfer was always between 67 bytes and 95 bytes. I looked in the source code for the driver (https://github.com/raspberrypi/linux/blob/rpi-4.14.y/drivers/spi/spi-bcm2835.c) and found that a transfer can be done in one of three modes, depending on the length of the transfer: (1) polling, (2) interrupt, and (3) DMA. The code to determine which mode is used is in bcm2835_spi_transfer_one(); note that the lower boundary of 67 bytes depends on the SPI speed which for me is 20 MHz on a RPi3 (400 MHz core clock). All the transfers that were failing were running in the interrupt mode.

I then looked in the bcm2835_spi_interrupt() interrupt handler, and there is a race-condition. At line 154 (https://github.com/raspberrypi/linux/blob/rpi-4.14.y/drivers/spi/spi-bcm2835.c#L154) bytes are drained from the RX FIFO. Then on line 159 there is a check to see if the transfer is complete (the DONE bit is set in the CS register). However, if the transfer is found to be completed on line 159 there is still a (small) chance that there is data in the RX FIFO that arrived at a time in between when lines 154 and 159 were executed.

Fix: By making the following change to the driver, it works correctly after that:

line 159: if (bcm2835_rd(bs, BCM2835_SPI_CS) & BCM2835_SPI_CS_DONE) { // this line is unchanged
line 160: if (bs->rx_len) // this line is added
line 161: bcm2835_rd_fifo(bs); // this line is added

I will make a patch and submit a pull request (I'm not well-versed in GitHub, so hopefully I do this in the correct order).

I noticed this issue, #2200, that could possibly be caused by the same bug.

@lategoodbye
Copy link
Contributor

Lukas Wunner is currently working on a patch series for spi-bcm2835 (upstream), which seems to contain a fix for this issue:
https://patchwork.kernel.org/patch/10673619/

@niklasekstrom
Copy link
Author

Thanks, that does indeed look like the same issue.

@pelwell
Copy link
Contributor

pelwell commented Nov 15, 2018

Are you prepared to wait until Lukas's patch set is accepted upstream? We can back-port it as soon as it gets merged somewhere.

@niklasekstrom
Copy link
Author

Absolutely, there's no hurry on my part.

@lategoodbye
Copy link
Contributor

@niklasekstrom Btw test feedback about Lukas patch series is always welcome.

@lategoodbye
Copy link
Contributor

The patch series and some additional clean up by Lukas has been applied to next.

@niklasekstrom
Copy link
Author

I'm closing this issue since a solution has been committed: 4de3f63#diff-d640347721b0bd06f8ffd76f755bdf74

@larkolab
Copy link

Hello,
I'm facing this issue, and it is quite blocking for me at the moment.
I've seen that there is a patch to try here: https://patchwork.kernel.org/patch/10673619/
Can anyone explain how to compile the patched driver and use it on my Pi?
Thank you so much !

@pelwell
Copy link
Contributor

pelwell commented Feb 14, 2019

That patch has been in firmware builds since 4.14.92 on January 9th. Run sudo rpi-update to install a pre-built kernel to test.

@larkolab
Copy link

oh great, thank you!
how can I check that I'm running the proper version of the driver?

@pelwell
Copy link
Contributor

pelwell commented Feb 14, 2019

Run dmesg | grep "Linux version". If the message includes dc4 and the version is 4.14.92 or greater then you should have the fix.

@larkolab
Copy link

dc4 ?

I get this, the version is ok, but no dc4 :)

[ 0.000000] Linux version 4.14.98-v7+ (dom@dom-XPS-13-9370) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1200 SMP Tue Feb 12 20:27:48 GMT 2019

@pelwell
Copy link
Contributor

pelwell commented Feb 14, 2019

No that's fine - dom=dc4, and the version is new enough, so you're good to go.

@larkolab
Copy link

Perfect, thank you very much !!
And my issue seems to be solved, with the testing done so far ! :)

@bhjelstrom
Copy link

While I agree that this was resolved in 4.14.98, the issue seems to have resurfaced in 4.19.66... has anyone else experienced this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants