-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes race condition in event wait logic of SDMMC driver. #2989
Fixes race condition in event wait logic of SDMMC driver. #2989
Conversation
54d3d9d
to
a4e26a6
Compare
Yes, here is the issue: #1138 |
d5b147f
to
aa0259e
Compare
I will test this today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antmerlino - At first I thought we lost 100 KB/S but I think it is card wear...
MCU: STM32H7[4|5]xxx, rev. Y The SD then dies
master (ish):
sd_bench
INFO [sd_bench] Using block size = 4096 bytes, sync=0
INFO [sd_bench]
INFO [sd_bench] Testing Sequential Write Speed...
INFO [sd_bench] Run 0: 545.97 KB/s, max write time: 16 ms (= 250.00 KB/s), fsync: 5 ms
INFO [sd_bench] Run 1: 544.51 KB/s, max write time: 14 ms (= 285.71 KB/s), fsync: 4 ms
INFO [sd_bench] Run 2: 537.77 KB/s, max write time: 31 ms (= 129.03 KB/s), fsync: 5 ms
INFO [sd_bench] Run 3: 544.27 KB/s, max write time: 14 ms (= 285.71 KB/s), fsync: 4 ms
INFO [sd_bench] Run 4: 544.19 KB/s, max write time: 15 ms (= 266.67 KB/s), fsync: 5 ms
INFO [sd_bench] Avg : 543.34 KB/s
nsh> sd_bench
INFO [sd_bench] Using block size = 4096 bytes, sync=0
INFO [sd_bench]
INFO [sd_bench] Testing Sequential Write Speed...
INFO [sd_bench] Run 0: 545.07 KB/s, max write time: 15 ms (= 266.67 KB/s), fsync: 4 ms
INFO [sd_bench] Run 1: 544.49 KB/s, max write time: 16 ms (= 250.00 KB/s), fsync: 5 ms
INFO [sd_bench] Run 2: 545.58 KB/s, max write time: 13 ms (= 307.69 KB/s), fsync: 4 ms
INFO [sd_bench] Run 3: 545.90 KB/s, max write time: 13 ms (= 307.69 KB/s), fsync: 4 ms
this pr:
sd_bench
INFO [sd_bench] Using block size = 4096 bytes, sync=0
INFO [sd_bench]
INFO [sd_bench] Testing Sequential Write Speed...
INFO [sd_bench] Run 0: 544.48 KB/s, max write time: 15 ms (= 266.67 KB/s), fsync: 4 ms
INFO [sd_bench] Run 1: 539.10 KB/s, max write time: 30 ms (= 133.33 KB/s), fsync: 5 ms
INFO [sd_bench] Run 2: 549.72 KB/s, max write time: 13 ms (= 307.69 KB/s), fsync: 6 ms
INFO [sd_bench] Run 3: 548.32 KB/s, max write time: 13 ms (= 307.69 KB/s), fsync: 6 ms
INFO [sd_bench] Run 4: 546.97 KB/s, max write time: 14 ms (= 285.71 KB/s), fsync: 4 ms
INFO [sd_bench] Avg : 545.72 KB/s
nsh> sd_bench
INFO [sd_bench] Using block size = 4096 bytes, sync=0
INFO [sd_bench]
INFO [sd_bench] Testing Sequential Write Speed...
ERROR [sd_bench] Write error
@antmerlino - there are also some CI failures that need looking at. |
@antmerlino MCU: STM32F76xxx, rev. Z - passes. |
@antmerlino MCU: STM32F42x, rev. 3 passes. |
aa0259e
to
922208d
Compare
Yeah, still trying to get them fixed. |
mmcsd:Remove CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT stm32h7:sdmmc remove CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT stm32f7:sdmmc remove CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT stm32f7:sdmmc WRITE COMPLETE prevent false triggers stm32h7:sdmmc WRITE COMPLETE prevent false triggers While testing PR apache#2989 on the H7 I noticed that the cards were staying in 1-bit mode. The root cause was that the scr read path was using DMA without an invlidate. This was caused by CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT, but the sdmmc driver, did not use the delayed invalidate nor would it work on 8 bytes. The driver fully supported dcache mgt on runt buffers, but the #ifdef CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT blocked it. Reviewing the PR that added CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT it may have been valid at the time. But after the dcache operations we fixed. It is not necessary and offers no benefit.
mmcsd:Remove CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT stm32h7:sdmmc remove CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT stm32f7:sdmmc remove CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT stm32f7:sdmmc WRITE COMPLETE prevent false triggers stm32h7:sdmmc WRITE COMPLETE prevent false triggers While testing PR apache#2989 on the H7 I noticed that the cards were staying in 1-bit mode. The root cause was that the scr read path was using DMA without an invlidate. This was caused by CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT, but the sdmmc driver, did not use the delayed invalidate nor would it work on 8 bytes. The driver fully supported dcache mgt on runt buffers, but the #ifdef CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT blocked it. Reviewing the PR that added CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT it may have been valid at the time. But after the dcache operations we fixed. It is not necessary and offers no benefit.
mmcsd:Remove CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT stm32h7:sdmmc remove CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT stm32f7:sdmmc remove CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT stm32f7:sdmmc WRITE COMPLETE prevent false triggers stm32h7:sdmmc WRITE COMPLETE prevent false triggers While testing PR #2989 on the H7 I noticed that the cards were staying in 1-bit mode. The root cause was that the scr read path was using DMA without an invlidate. This was caused by CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT, but the sdmmc driver, did not use the delayed invalidate nor would it work on 8 bytes. The driver fully supported dcache mgt on runt buffers, but the #ifdef CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT blocked it. Reviewing the PR that added CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT it may have been valid at the time. But after the dcache operations we fixed. It is not necessary and offers no benefit.
@antmerlino please fix the Conflicting files. |
@acassis - after the rebase this is not ready to be merged, as we still need to resolve the issue, as reported above. I plan to retest this week. |
@antmerlino
Debugging it now.... |
This change makes it so that the timeout is set as part of the SDIO_WAITENABLE call instead of the SDIO_EVENTWAIT call. By doing so, you eliminate all opportunity for a race condition. stm32h7:sdmmc Check if busy ended early
922208d
to
8632a38
Compare
@antmerlino - Changes added - thank you for your guidance! Rebased on master, squashed and force pushed. |
H7 - Now working |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antmerlino - Thank you again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
…ELAYED_INVLDT mmcsd:Remove CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT stm32h7:sdmmc remove CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT stm32f7:sdmmc remove CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT stm32f7:sdmmc WRITE COMPLETE prevent false triggers stm32h7:sdmmc WRITE COMPLETE prevent false triggers While testing PR apache#2989 on the H7 I noticed that the cards were staying in 1-bit mode. The root cause was that the scr read path was using DMA without an invlidate. This was caused by CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT, but the sdmmc driver, did not use the delayed invalidate nor would it work on 8 bytes. The driver fully supported dcache mgt on runt buffers, but the #ifdef CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT blocked it. Reviewing the PR that added CONFIG_ARCH_HAVE_SDIO_DELAYED_INVLDT it may have been valid at the time. But after the dcache operations we fixed. It is not necessary and offers no benefit.
Background
After some configuration changes, namely switching to tickless mode, I started to hit the DEBUGASSERTION here:
https://github.com/apache/incubator-nuttx/blob/master/arch/arm/src/stm32f7/stm32_sdmmc.c#L1460
Basically, the SDMMC driver timed out, but didn't expect to since wkupevents and waitevents are zero. Further debugging shows the event did finish and the timeout should not have occurred.
After @davids5 and I debugged further, we found 2 issues:
Summary
This change fixes a race condition that was noted in a comment in the code. The fundamental issue was that the watchdog was being started after the transfer is started. Instead of passing in a timeout and starting the watchdog inside of SDIO_EVENTWAIT, we can simply do that in SDIO_WAITENABLE.
Impact
The SDMMC driver interface has changed slightly. Moving the timeout parameter from the SDIO_EVENTWAIT call to the SDIO_WAITENABLE call.
Because the SDIO_WAITENABLE call starts the watchdog, the caller must ensure SDIO_CANCEL is called if the caller is not going to continue to calling SDIO_EVENTWAIT.
Testing
I have tested this on an STM32F765VI based platform and it addresses the race condition.
Kudos to @davids5 for helping debug this one!