flash: Add program/erase parallelism support for STM32F4x #67919

duda-patryk · 2024-01-22T15:33:22Z

The implementation uses the same approach as STM32F1x.

Program/erase speed can be set by setting 'write-block-size' flash property to 1, 2, 4 or 8.

de-nordic

LGTM, but I have a question:
I can see that the size of write is decided on write at run-time, at line 114, as the size is passed there.
Wouldn't it be most efficient for the driver to select the largest possible write size?
For example user is presented with write-block-size == 2, so every buffer address and size needs to be two bytes aligned, but the same buffer may be 4 or 8 bytes aligned which means that for the driver it would be most efficient to select write by that size.

duda-patryk · 2024-01-22T18:06:12Z

Wouldn't it be most efficient for the driver to select the largest possible write size?

Max write size depends on MCU voltage.

every buffer address and size needs to be two bytes aligned

Unaligned buffer is handled with memcpy() in 224-226. The compiler emits very efficient code - for 4 byte write, it's just 4 instructions (logical or with logical shift). The size and flash offset need to be aligned as it is in flash_stm32f1x.c.

de-nordic · 2024-01-23T13:24:50Z

Wouldn't it be most efficient for the driver to select the largest possible write size?

Max write size depends on MCU voltage.

Hm. Interesting, could not find the info in STM32F4 docs, but not a platform specialist here.

every buffer address and size needs to be two bytes aligned

Unaligned buffer is handled with memcpy() in 224-226. The compiler emits very efficient code - for 4 byte write, it's just 4 instructions (logical or with logical shift). The size and flash offset need to be aligned as it is in flash_stm32f1x.c.

You have two alignments here: source and destination; memcpy aligns source buffer, you could also use here UNALIGNED_GET .

What I mean by changing write alignment is that it is probably most efficient, although I do not know the platform, to write by the highest possible chunk.
For example when you have 'write-block-size` == 2, then it is expected that user will give you offset and size of buffer aligned to 2;
but notice that buffer aligned to 2 will, if the length is >4, at some point align to 4 and even to 8, if size is >8.
Because you control the size, by which the data is being copied it chunks, at run-time, you can change it as you wish.
For example N length buffer, aligned at 2, may be copied by sizes 2 + 4 + N * 8 + 4 + 2, or 2 + (N + 1) * 8 + 2.

duda-patryk · 2024-01-23T15:14:04Z

Wouldn't it be most efficient for the driver to select the largest possible write size?

Max write size depends on MCU voltage.

Hm. Interesting, could not find the info in STM32F4 docs, but not a platform specialist here.

It's in the STM32F412 reference manual RM0402 Rev 6, 3.5.2 Program/erase parallelism.

You have two alignments here: source and destination; memcpy aligns source buffer, you could also use here UNALIGNED_GET

Is UNALIGNED_GET preferred over memcpy()?

What I mean by changing write alignment is that it is probably most efficient, although I do not know the platform, to write by the highest possible chunk.

Yes, it would be the most efficient, but in this case, the chunk size depends on applied voltage, so the write-block-size property is a maximum chunk size (we trust that the user read the manual).

Fun fact:
While configuring number of flash wait states (3.4.1 Relation between CPU clock frequency and Flash memory read time), Zephyr silently assumes that the voltage is between 2.7V - 3.6V. I haven't checked it, but I suspect that Zephyr won't boot when low voltage is applied to the MCU.

de-nordic · 2024-01-23T15:22:38Z

Wouldn't it be most efficient for the driver to select the largest possible write size?

Max write size depends on MCU voltage.

Hm. Interesting, could not find the info in STM32F4 docs, but not a platform specialist here.

It's in the STM32F412 reference manual RM0402 Rev 6, 3.5.2 Program/erase parallelism.

Thanks, now I can see this. Very interesting.

You have two alignments here: source and destination; memcpy aligns source buffer, you could also use here UNALIGNED_GET

Is UNALIGNED_GET preferred over memcpy()?

Check what will give you better results. I suspect that UNALIGNED_GET will give you smaller footprint, smaller stack usage and quicker execution as it does not require doing all the function call/return stuff.

What I mean by changing write alignment is that it is probably most efficient, although I do not know the platform, to write by the highest possible chunk.

Yes, it would be the most efficient, but in this case, the chunk size depends on applied voltage, so the write-block-size property is a maximum chunk size (we trust that the user read the manual).

Nice to know I am not the only one.

Fun fact: While configuring number of flash wait states (3.4.1 Relation between CPU clock frequency and Flash memory read time), Zephyr silently assumes that the voltage is between 2.7V - 3.6V. I haven't checked it, but I suspect that Zephyr won't boot when low voltage is applied to the MCU.

I guess user has to check that before sending board to the field.

de-nordic

LGTM.
I have left comment on possible usage of `UNALIGNED_GET, but that is up to the committer to validate.

de-nordic · 2024-01-23T15:25:08Z

You probably want to update copyrights.

duda-patryk · 2024-01-23T16:01:23Z

LGTM. I have left comment on possible usage of `UNALIGNED_GET, but that is up to the committer to validate.

Clang compiles it to the same instructions. It's just a decision between

value = UNALIGNED_GET((flash_prg_t *)data + i);

and

memcpy(&value, (const uint8_t *)data + i * sizeof(flash_prg_t), sizeof(flash_prg_t));

I must admit that UNALIGNED_GET looks nice, but let the code owner decide 😄

You probably want to update copyrights.

Google is already in copyrights. Should I change the year?

de-nordic · 2024-01-23T16:04:14Z

I must admit that UNALIGNED_GET looks nice, but let the code owner decide 😄

ST people decide then.

You probably want to update copyrights.

Google is already in copyrights. Should I change the year?

Yes, date but that is up to your company policy.

erwango · 2024-01-25T10:41:16Z

ST people decide then.

UNALIGNED_GET is already in use in some others STM32 flash drivers, and personally I like it, so indeed, would be a good move.

The implementation uses the same approach as STM32F1x. Program/erase speed can be set by setting 'write-block-size' flash property to 1, 2, 4 or 8. Signed-off-by: Patryk Duda <patrykd@google.com>

duda-patryk · 2024-01-25T12:00:11Z

Changed memcpy() to UNALIGNED_GET().

erwango

LGTM, Thanks.
Though I have a concern on the documentation of this configurability and how one can guess this is actually a tunable parameter when reading the binding.

I'm not blocking since I can see that this is already the case on STM32F1.

Option I see is to add per series variants of st,stm32-nv-flash with new write-block-size prop defined as an enum with proper description.

duda-patryk requested review from erwango, ABOSTM and FRASTM as code owners January 22, 2024 15:33

zephyrbot added area: Flash platform: STM32 ST Micro STM32 labels Jan 22, 2024

zephyrbot requested review from de-nordic, Desvauxm-st, gautierg-st and GeorgeCGV January 22, 2024 15:33

zephyrbot assigned erwango Jan 22, 2024

duda-patryk requested a review from niedzwiecki-dawid January 22, 2024 15:34

de-nordic reviewed Jan 22, 2024

View reviewed changes

de-nordic previously approved these changes Jan 23, 2024

View reviewed changes

flash: Add program/erase parallelism support for STM32F4x

d5eadcf

The implementation uses the same approach as STM32F1x. Program/erase speed can be set by setting 'write-block-size' flash property to 1, 2, 4 or 8. Signed-off-by: Patryk Duda <patrykd@google.com>

duda-patryk dismissed de-nordic’s stale review via d5eadcf January 25, 2024 11:57

duda-patryk force-pushed the stm32f4x_flash_parallelism branch from 34feeff to d5eadcf Compare January 25, 2024 11:57

de-nordic approved these changes Jan 25, 2024

View reviewed changes

erwango approved these changes Jan 25, 2024

View reviewed changes

carlescufi merged commit 73396f9 into zephyrproject-rtos:main Jan 25, 2024
17 checks passed

erwango mentioned this pull request Jan 25, 2024

stm32: Document user configurable write-block-size #68126

Closed

de-nordic mentioned this pull request Jan 26, 2024

RFC: Introduce explicit_erase capability in the Flash API to support memory devices that may not require erase #67687

Closed

fabiobaltieri mentioned this pull request Aug 7, 2024

Collaborator nomination for duda-patryk #76782

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flash: Add program/erase parallelism support for STM32F4x #67919

flash: Add program/erase parallelism support for STM32F4x #67919

duda-patryk commented Jan 22, 2024

de-nordic left a comment

duda-patryk commented Jan 22, 2024

de-nordic commented Jan 23, 2024

duda-patryk commented Jan 23, 2024

de-nordic commented Jan 23, 2024

de-nordic left a comment

de-nordic commented Jan 23, 2024

duda-patryk commented Jan 23, 2024

de-nordic commented Jan 23, 2024

erwango commented Jan 25, 2024

duda-patryk commented Jan 25, 2024

erwango left a comment •

edited

Loading

flash: Add program/erase parallelism support for STM32F4x #67919

flash: Add program/erase parallelism support for STM32F4x #67919

Conversation

duda-patryk commented Jan 22, 2024

de-nordic left a comment

Choose a reason for hiding this comment

duda-patryk commented Jan 22, 2024

de-nordic commented Jan 23, 2024

duda-patryk commented Jan 23, 2024

de-nordic commented Jan 23, 2024

de-nordic left a comment

Choose a reason for hiding this comment

de-nordic commented Jan 23, 2024

duda-patryk commented Jan 23, 2024

de-nordic commented Jan 23, 2024

erwango commented Jan 25, 2024

duda-patryk commented Jan 25, 2024

erwango left a comment • edited Loading

Choose a reason for hiding this comment

erwango left a comment •

edited

Loading