Arasan SD host fifo overrun causes corruption

## Description

With specific (currently unknown) timing situations it has been found that writing to the SD card can result in failure and filesystem corruption due to the fifo full flag being incorrectly sampled between two different clock domains.
## Information

With specific SD cards (I've got one from popcornmix) it seems there is a specific error that always happens when trying to write long continuous sequences of data.  The error was diagnosed using my SD card protocol analyser (that I wrote using some very cool and very special hardware!) and shows that the data written to the SD card was invalid and that whole sectors of data are missing from the output...

This was done using a piece of test code that directly opened a partition on the SD card and wrote 1MiB of pseudo random sequence (PRS).  The protocol analyser then checked the write data to make sure the least significant bit was correct (currently I've only made it sample one data bit), there was an error found in the data written to the card from the Arasan module.

I then checked the data actually written to the SDCard by plugging into my linux box and writing it out to a file and finally did a binary comparison between the data read from the card and the pseudo random sequence we were meant to write.  There were clear errors in the write data stream such that we missed whole sectors of data.

MEMORY -> DMA -> FIFO -> ARASAN MODULE -> SDCard

The above shows the flow of data from the memory to the SDCard.  I know that the PRS was correct in memory and I get an interrupt from the DMA module when it has finished transmitting the block of data so I know that it was written successfully into the FIFO.  But the data is not being written out to the SDCard and we're missing blocks as small as a single sector and up to 1024 bytes at a time (the FIFO is 1024 bytes x 2).

So the only possible issue here is that the DREQ signal output from the ARASAN module to the DMA controller is wrong, this is the signal that tells the DMA whether there is any space for it to write data...  But there is no window size feedback so it is possible in the classical network module for us to overflow that fifo
## Workaround

To confirm the issue I first switched to PIO mode, this can be done fairly easily by making shdci-bcm2708.c always return 0 from dmaable() function, this will then not use DMA at all.

Unfortunately there is a further bug that effects PIO mode, this is due to the fact that the STATUS register takes a number of SD clock cycles to clock the fifo status through and therefore when we fill the fifo in sdhci_transfer_pio we cannot believe the value of SDHCI_SPACE_AVAILABLE.  It will take a number of cycles to come through to that register (and for some reason it may not ever reset...) so we also need to add an unconditional break to this function

This is only a first stab workaround because it means we end up making the SD card throughput poor!  We'd prefer not to do this if possible!
## Commit

Not yet committed.  Need a solution that works for writing and reading without losing too much performance!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Arasan SD host fifo overrun causes corruption #415

Description

Information

Workaround

Commit

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Arasan SD host fifo overrun causes corruption #415

Description

Description

Information

Workaround

Commit

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions