Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk: sdhc: fix SPI clock setup #26319

Closed

Conversation

vaussard
Copy link
Contributor

It was noticed that the SPI SDHC driver has a very slow read/write speed when accessing the SD card on nrf52. The investigation shows that the driver has 2 main problems:

  • It keeps using the initial clock setup, because it does not pass a new configuration to spi_context_configured() but only change the frequency in the existing configuration. Since this is not detected as a new configuration, the old clock speed is used in the data phase.
  • The clock speed is hardcoded to 4MHz, even if the specification allows a clock up to 25MHz in default speed mode.

These issues are not specific to nrf52, since spi_context_configured() is used on most other SPI drivers. This is probably the same issue reported on STM32: #22906.

These problems are fixed by declaring a second SPI configuration for the default speed mode, and taking the clock frequency from the spi-max-frequency DTS property (limited to 25MHz).

Here are some test results. Note that the nrf52 SPI driver has coarse prescalers, so the real clock speed is 250kHz when selecting 400kHz, and 16MHz when selecting 25MHz. Other platforms will probably give different results. Tested with a SanDisk 16GB microSD.

master (250 kHz) Fix 1 (4 MHz) Fix 2 (16 MHz)
read (4 MB) 27 kB/s 181 kB/s 364 kB/s
write (10 MB) 4.6 kB/s 29 kB/s 41 kB/s

The throughput is increased by 10 times when using these patches.

@EminYagmahan
Copy link

EminYagmahan commented Jun 20, 2020

I just tried your suggested changes on the STM32 B-L475-IOT device . However, I had to rename some identifiers back to the default values to be able to compile the project and repeat the read and write tests for the sd card.

...
- data->spi = device_get_binding(DT_BUS_LABEL(DT_INST(0, zephyr_mmc_spi_slot)));
+ data->spi = device_get_binding(DT_INST_0_ZEPHYR_MMC_SPI_SLOT_BUS_NAME);
...

Nevertheless, here are my results:

Test setup:

  • Device: STM32 B-L475-IOT
  • SPI: 20 MHz
  • SD Card: 16GB microSD Samsung Class 6

Results:

R/W Without Fix With Fix
Read 1MB 36,0 KByte/s 227,5 KByte/s
Write 1MB 27,6 KByte/s 75,6 KByte/s
Read 5MB 36,7 KByte/s 227,4 KByte/s
Write 5MB 28,1 KByte/s 79,5 KByte/s

Great Fix! With identical measurement setup the write rate is almost 3x faster and read rate is 6x faster.

@vaussard
Copy link
Contributor Author

Great, thanks for testing @EminYagmahan ! That is already a good start. I would have expected a better throughput when reading from your STM32L4, since my nrf52 was around 350 kB/s for a SPI frequency of 16 MHz. A couple of ideas:

  • Maybe your SPI driver do not select exactly 20 MHz but a lower clock due to the prescaler ? You could enable this LOG_DBG to see which frequency is selected by the STM32 driver:
    LOG_DBG("Installed config %p: freq %uHz (div = %u),"
    .
  • Is the SPI driver configured to use DMA, IRQ or polling mode ? CONFIG_SPI_STM32_DMA should give the best performance, but I never used it my-self
  • Or maybe it is just the SD card which is different. Mine is clearly underperforming in write speed, so maybe I will look into multiblock write once my pet project is done

I just tried your suggested changes on the STM32 B-L475-IOT device . However, I had to rename some identifiers back to the default values to be able to compile the project and repeat the read and write tests for the sd card.

...
- data->spi = device_get_binding(DT_BUS_LABEL(DT_INST(0, zephyr_mmc_spi_slot)));
+ data->spi = device_get_binding(DT_INST_0_ZEPHYR_MMC_SPI_SLOT_BUS_NAME);
...

Ok strange. I did not changed the data->spi = ... line my-self. Looking at the git blame, this was changed in March by commit b2f13cc. Since this PR is based on the latest master, this is maybe a problem on your setup compiled with the latest Zephyr ?

CI has failures, but it looks like other PRs have the same failures. So it does not help... I compiled samples/subsys/fs/fat_fs/ for olimexino_stm32 (STM32F1), which also enables this driver. It worked fine.

@EminYagmahan
Copy link

Very nice indeed. I had already forgotten that I was on an older version. Tomorrow I will port my project to the latest version and report if there are any differences.
I found another SD card and prepared it for testing. This one is a SanDisk 16GB Edge Class 4 SD card. The results were as followed:

R/W With Fix
Read 5MB 218,4 KByte/s
Write 5MB 78,4 KByte/s
  • Maybe your SPI driver do not select exactly 20 MHz but a lower clock due to the prescaler ? You could enable this LOG_DBG to see which frequency is selected by the STM32 driver:

I had already checked them in advance. 20MHz in operation and 315kHz during init are the maximum I can achieve with my prescalers. This is also the output of spi_ll_stm32.c as well as the oscilloscope.

  • Is the SPI driver configured to use DMA, IRQ or polling mode ? CONFIG_SPI_STM32_DMA should give the best performance, but I never used it my-self

I tried earlier to run the same project via DMA but even here there are no differences in read and write rates. The data transfer will probably not be the bottleneck at these low rates.

@EminYagmahan
Copy link

I have just repeated my measurements with the latest master branch. The results are the same. Compiling works fine. 👍

@FRASTM
Copy link
Collaborator

FRASTM commented Jun 22, 2020

@ EminYagmahan could you also please confirm if this also fixes #22906
or if more tests/investigation is required

@EminYagmahan
Copy link

@FRASTM the problem has indeed been partially resolved. With this fix the performance is much improved. Nevertheless, it is possible to get even more performance boosts with the extension of the API, if instead of Single Block Write/Read, the Multi Block Write/Read command could be supported. Actually my issue addresses that part of the read and write commands.

@nzmichaelh
Copy link
Collaborator

Hi @vaussard could you talk with, say, @carlescufi about the change? This might be working around a bug in the SPI driver(s), and it could be that the SPI drivers should be fixed instead.

Specifically, include/drivers/spi.h doesn't say what to do if the same spi_config is passed to spi_transcieve(). The SAM0, SiFive, litespi, gecko, and others properly reconfigure the port if the contents of the config change. However, the nrfx and stm32 drivers, and any others that use spi_context_configured, skip reconfiguration if the address is the same.

The header file doesn't say which is right. My vote is that the SAM0 etc drivers are right and the others should be fixed.

vaussard added 3 commits June 23, 2020 21:05
It was noticed (on nrf52) that the SPI clock is not increased after the
initialization of the SD card, keeping the initial speed of 400 kHz (or
even less depending on the SPI driver, i.e. 250 kHz on nrf52). This
heavily impacts the transfer speed.

This is due to most (if not all) SPI drivers calling
spi_context_configured(), which detects if a new configuration is
provided. This function only checks the pointer to the configuration
structure, since struct spi_context does not keep a full copy.
Simply updating the frequency is not enough to be detected as a new
configuration.

The solution is simple: provide physically different configurations, so
that the SPI driver will notice the change and really apply it.

Signed-off-by: Florian Vaussard <florian.vaussard@gmail.com>
Default speed mode allows a maximum clock speed of 25 MHz according to
the SD Specifications Part 1 Physical Layer, so there is no reason to
limit the clock speed to 4 MHz.

Signed-off-by: Florian Vaussard <florian.vaussard@gmail.com>
Some host or setup might not allow to run at full speed. Use the
frequency specified in the "spi-max-frequency" property of the SDHC
node.

Signed-off-by: Florian Vaussard <florian.vaussard@gmail.com>
@vaussard vaussard force-pushed the master_fix-spi-sdhc branch from bf9f1a5 to 8f7db05 Compare June 23, 2020 19:06
@vaussard
Copy link
Contributor Author

I have just repeated my measurements with the latest master branch. The results are the same. Compiling works fine.

Thanks for testing again @EminYagmahan . If your SPI clock is at 20MHz, then I cannot really explain why you have a slower read than me. But it is already a good improvement.

@ EminYagmahan could you also please confirm if this also fixes #22906
or if more tests/investigation is required

Yes there is some room for improvement @FRASTM but this PR is already fixing the issue with the SPI bus. That is a first step. I might have a look into multi-block read/write in a couple of weeks, but currently the throughput is enough for me and I have to finish a small side project first.

This might be working around a bug in the SPI driver(s), and it could be that the SPI drivers should be fixed instead.

Yes @nzmichaelh the API documentation is not clear about that point. So either we fix the doc, either we fix the non-conforming drivers (and I can drop my first commit from this PR and probably also remove spi_context_configured()). I have no strong opinion on that.

But what I can see: on all the SPI drivers (17), 14 are calling spi_context_configured(). And... the SAM0 is among them. If I refer to the code, SAM0 has the same behaviour as STM32/nrf52/etc. Have a look: https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/spi/spi_sam0.c#L78.

Here are the 3 drivers that do not call spi_context_configured(): spi_gecko.c, spi_litespi.c and spi_sifive.c. They are a minority.

Any thought on that @carlescufi ? BTW do you know why no reviewer was assigned to this PR ? If I refer to CODEOWNER, disk access is managed by @JunYangNXP. Maybe @tbursztyka would like also to comment the SPI part.

@vaussard
Copy link
Contributor Author

Hello. Is there anybody interested in doing a review of this fix ? :-) It looks like other people are having similar issues (see for example #26513), so it would be nice to find a solution. Otherwise people will keep bumping into this trivial issue.

@@ -40,6 +48,17 @@ struct sdhc_spi_data {

DEVICE_DECLARE(sdhc_spi_0);

static inline const struct spi_config *sdhc_get_spi_cfg(const struct sdhc_spi_data *data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: each function in this file starts with 'sdhc_spi_'. Update the new functions to match like 'sdhc_spi_get_spi_cfg'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need these if the pointer is maintained rather than the enum.

struct device *cs;
uint32_t pin;
gpio_dt_flags_t flags;

enum sdhc_spi_cfg current_spi_cfg;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: change this to 'struct spi_config *spi_cfg'. This means you can drop the enum and the helper functions which makes the code easier to read.

data->cfg.slave = DT_REG_ADDR(DT_INST(0, zephyr_mmc_spi_slot));
/* SPI config for initialization (default) */
sdhc_select_spi_cfg(data, SDHC_SPI_INIT_CFG);
data->spi_cfgs[SDHC_SPI_INIT_CFG] = (struct spi_config){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this style, but I haven't seen it used much in the rest of Zephyr. People seem to prefer one line per member when this is inside a function.

No need to do anything unless someone else comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this style reads more naturally here, and is consistent with use like

var = (struct k_whatever)K_WHATEVER_INITIALIZER();

@@ -795,8 +798,12 @@ static int sdhc_spi_init(struct device *dev)
.slave = DT_REG_ADDR(DT_INST(0, zephyr_mmc_spi_slot)),
};
/* SPI config for default speed */
if (spi_max_frequency > SDHC_SPI_DEFAULT_SPEED) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a MAX? Shouldn't the driver always take the speed from DeviceTree if it is supplied?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the commiter took the name out of the dts property (see line 789 above), but it is definitely coming from dts here.

@nzmichaelh
Copy link
Collaborator

Hello. Is there anybody interested in doing a review of this fix ? :-) It looks like other people are having similar issues (see for example #26513), so it would be nice to find a solution. Otherwise people will keep bumping into this trivial issue.

Hi @vaussard I don't have CODEOWNERS but the change LGTM.

I don't know Zephyr's style, but consider also logging a tracking bug for the identity issue with the spi_config and add a TODO to the code. That way it'll be easier to find and clean up if/when the spi_config issue is decided.

Copy link
Collaborator

@rick1082 rick1082 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on nRF5340, confirmed the SCLK will change to higher frequency after SDHC initialized

/* Clock speed used after initialisation */
#define SDHC_SPI_SPEED 4000000
#define SDHC_SPI_DEFAULT_SPEED MHZ(25)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 should come from somewhere. Not sure if it should come from DT_* or Kconfig. Last resort is having it as a function parameter.

struct sdhc_spi_data {
struct device *spi;
struct spi_config cfg;
struct spi_config spi_cfgs[SDHC_CFG_COUNT];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These structures are filled only with values that are known at the build time, so they could be extracted to the constant config_info structure for the device (which is currently not used at all; see the NULL in the DEVICE_AND_API_INIT call) and use static initialization. This would save some RAM and flash.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; there should be a file-local type and object like:

struct spi_sdhc_config {
   struct spi_config init_cfg;
   struct spi_config oper_cfg;
} sdhc_spi_cfg_0;

with a static const instance that's passed via DEVICE_AND_API_INIT.

Then in this structure there should just be:

    const struct spi_config *spi_cfg;

that is assigned to either &config->init_cfg or &config->oper_cfg at the points where disk->status is changed to UNINIT or OK.

(just as @nzmichaelh recommended, now that I reach that comment.)

Copy link
Collaborator

@pabigot pabigot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting:

This is due to most (if not all) SPI drivers calling
spi_context_configured(), which detects if a new configuration is
provided. This function only checks the pointer to the configuration
structure, since struct spi_context does not keep a full copy.
Simply updating the frequency is not enough to be detected as a new
configuration.

It makes sense, but if it's not a documented behavior that really needs to be fixed. I'll look into that.

Relevant here, I think this can be simplified as described.

@@ -23,13 +23,21 @@ LOG_MODULE_REGISTER(sdhc_spi, CONFIG_DISK_LOG_LEVEL);
#if !DT_NODE_HAS_STATUS(DT_INST(0, zephyr_mmc_spi_slot), okay)
#warning NO SDHC slot specified on board
#else
enum sdhc_spi_cfg {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need this it would be less confusing and have less impact on potential namespace collisions if the enum name and the tag prefixes were consistent. Could you make this:

enum sdhc_spi_cfg {
	SDHC_SPI_CFG_INIT,
	SDHC_SPI_CFG_DEFAULT_SPEED,
	/* --- */
	SDHC_SPI_CFG_COUNT,
};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any expectation that uses will require more configurations than the one used during device startup, and the one used during device operation? I'm going to assume "no", so subsequent comments will be from that perspective.

struct sdhc_spi_data {
struct device *spi;
struct spi_config cfg;
struct spi_config spi_cfgs[SDHC_CFG_COUNT];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; there should be a file-local type and object like:

struct spi_sdhc_config {
   struct spi_config init_cfg;
   struct spi_config oper_cfg;
} sdhc_spi_cfg_0;

with a static const instance that's passed via DEVICE_AND_API_INIT.

Then in this structure there should just be:

    const struct spi_config *spi_cfg;

that is assigned to either &config->init_cfg or &config->oper_cfg at the points where disk->status is changed to UNINIT or OK.

(just as @nzmichaelh recommended, now that I reach that comment.)

@@ -40,6 +48,17 @@ struct sdhc_spi_data {

DEVICE_DECLARE(sdhc_spi_0);

static inline const struct spi_config *sdhc_get_spi_cfg(const struct sdhc_spi_data *data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need these if the pointer is maintained rather than the enum.

@@ -136,8 +155,8 @@ static int sdhc_spi_tx(struct sdhc_spi_data *data, const uint8_t *buf, int len)
};

return sdhc_spi_trace(data, 1,
spi_write(data->spi, &data->cfg, &tx), buf,
len);
spi_write(data->spi, sdhc_get_spi_cfg(data), &tx),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be come data->spi_cfg

@henrikbrixandersen
Copy link
Member

@vaussard Please rebase on master and fix the merge conflict.

@lowlander
Copy link
Collaborator

Shouldn't the SPI dirvers simply be fixed instead of trying to work around the "pointer compare" problem by having two buffers with almost the same data (just the speeds are different) to just change the speed of the SPI bus. This driver will not be the only one falling in that trap, user applications will also have this problem. I think this PR is not a real solution for bug #28093 and it should be fixed in the SPI drivers.

@henrikbrixandersen
Copy link
Member

Shouldn't the SPI dirvers simply be fixed instead of trying to work around the "pointer compare" problem by having two buffers with almost the same data (just the speeds are different) to just change the speed of the SPI bus. This driver will not be the only one falling in that trap, user applications will also have this problem. I think this PR is not a real solution for bug #28093 and it should be fixed in the SPI drivers.

The drivers in their current form fulfil the contract of the SPI driver API, so they cannot really be "fixed". A change to the SPI API contract could be considered, of course. I do believe, however, that not many Zephyr SPI device drivers has the need for changing the SPI SCL frequency. The SPI SDHC driver is the exception here, and the exception can be easily handled by using two different configuration structs.

@lowlander
Copy link
Collaborator

A change to the SPI API contract could be considered, of course.

I believe the SPI API docu was updated (to point out this problem) not that long ago, which sounds to me the API contract has been adapted to the broken implementations. :-)

Of course even if the SPI implementation would be fixed this PR still works, nothing forbids you from using several different config buffers.

@henrikbrixandersen
Copy link
Member

henrikbrixandersen commented Sep 7, 2020

I believe the SPI API docu was updated (to point out this problem) not that long ago

That is true, but he underlying spi_context_configured() utility function (which most SPI controller driver use) was introduced by @tbursztyka in 19b36ae over 3 years ago. The API documentation has contained a note about which parameters in struct spi_config can be changed between calls since 7f4378e. The note about pointer comparison was added more recently in 7a8a4c9.

@tbursztyka
Copy link
Collaborator

This is made to avoid re-configuring the SPI controller each and every time the same config is being used for transactions. (thus the spi_context_configured() ). Reconfiguring the controller is costly and as long af the same spi_config gets the hand to make transaction there is no point to reconfigure the controller.
The underlying idea also was to make spi_config constant, to save a bit of memory.

Anywoy if you want to switch clock: you need another spi_config.
Drivers which are re-configuring at every transceive are bogus.

@lowlander
Copy link
Collaborator

This is made to avoid re-configuring the SPI controller each and every time the same config is being used for transactions. (thus the spi_context_configured() ). Reconfiguring the controller is costly and as long af the same spi_config gets the hand to make transaction there is no point to reconfigure the controller.

If in the normal case it makes no sense to reconfigure the controller the spi_config should not have been part of those API calls, and there just should have been a spi_configure(struct spi_config*) call so the user can reconfigure things when they want to change something like the speed.

The underlying idea also was to make spi_config constant, to save a bit of memory.

I also understand the idea of trying to save some memory, but since things like cs_hold are part of the struct users can not make it const and so (like the sdhc driver) they end up making two structs in RAM to deal with it. So the memory saved in the SPI driver ends up being memory wasted somewhere else.

Anywoy if you want to switch clock: you need another spi_config.
Drivers which are re-configuring at every transceive are bogus.

I think it is a very error prone API, but if there apparently is no possibility to fix it, at least the documentation should be updated with for example an usage example and not just some obscure "a pointer compare may be used" comment. For example it is unclear if setting the cs_hold will be honored or not if it is changed in an existing struct. Same with the cs pointer, does that "a pointer compare" apply to that in case a user wants to switch CS while he has more than 1 device?

But anyway, it would be nice if this PR could be rebased and land in master (is that even possible with the 2.4.rc1 being out?).

@erwango
Copy link
Member

erwango commented Sep 11, 2020

@vaussard, do you plan to continue this PR?

@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Nov 11, 2020
@github-actions github-actions bot closed this Nov 25, 2020
@richardbarlow
Copy link
Contributor

I would like to take up the mantle of getting this pull request into shape and getting it merged. What would be the best way of going about this? Since this is a pull request from @vaussard's fork, I will not have the ability to modify the commits in question. Shall I create a new pull request with the changes rebased onto master and change to using a config struct and pointer switching?

@erwango
Copy link
Member

erwango commented Mar 24, 2021

@richardbarlow It would be nice indeed. The easiest is to create a new PR.

@jfischer-no
Copy link
Collaborator

#32996 is bug report for the issue this PR attempted to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.