Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flash: Add program/erase parallelism support for STM32F4x #67919

Merged

Conversation

duda-patryk
Copy link
Collaborator

The implementation uses the same approach as STM32F1x.

Program/erase speed can be set by setting 'write-block-size' flash property to 1, 2, 4 or 8.

Copy link
Collaborator

@de-nordic de-nordic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I have a question:
I can see that the size of write is decided on write at run-time, at line 114, as the size is passed there.
Wouldn't it be most efficient for the driver to select the largest possible write size?
For example user is presented with write-block-size == 2, so every buffer address and size needs to be two bytes aligned, but the same buffer may be 4 or 8 bytes aligned which means that for the driver it would be most efficient to select write by that size.

@duda-patryk
Copy link
Collaborator Author

Wouldn't it be most efficient for the driver to select the largest possible write size?

Max write size depends on MCU voltage.

every buffer address and size needs to be two bytes aligned

Unaligned buffer is handled with memcpy() in 224-226. The compiler emits very efficient code - for 4 byte write, it's just 4 instructions (logical or with logical shift). The size and flash offset need to be aligned as it is in flash_stm32f1x.c.

@de-nordic
Copy link
Collaborator

Wouldn't it be most efficient for the driver to select the largest possible write size?

Max write size depends on MCU voltage.

Hm. Interesting, could not find the info in STM32F4 docs, but not a platform specialist here.

every buffer address and size needs to be two bytes aligned

Unaligned buffer is handled with memcpy() in 224-226. The compiler emits very efficient code - for 4 byte write, it's just 4 instructions (logical or with logical shift). The size and flash offset need to be aligned as it is in flash_stm32f1x.c.

You have two alignments here: source and destination; memcpy aligns source buffer, you could also use here UNALIGNED_GET .

What I mean by changing write alignment is that it is probably most efficient, although I do not know the platform, to write by the highest possible chunk.
For example when you have 'write-block-size` == 2, then it is expected that user will give you offset and size of buffer aligned to 2;
but notice that buffer aligned to 2 will, if the length is >4, at some point align to 4 and even to 8, if size is >8.
Because you control the size, by which the data is being copied it chunks, at run-time, you can change it as you wish.
For example N length buffer, aligned at 2, may be copied by sizes 2 + 4 + N * 8 + 4 + 2, or 2 + (N + 1) * 8 + 2.

@duda-patryk
Copy link
Collaborator Author

Wouldn't it be most efficient for the driver to select the largest possible write size?

Max write size depends on MCU voltage.

Hm. Interesting, could not find the info in STM32F4 docs, but not a platform specialist here.

It's in the STM32F412 reference manual RM0402 Rev 6, 3.5.2 Program/erase parallelism.

You have two alignments here: source and destination; memcpy aligns source buffer, you could also use here UNALIGNED_GET

Is UNALIGNED_GET preferred over memcpy()?

What I mean by changing write alignment is that it is probably most efficient, although I do not know the platform, to write by the highest possible chunk.

Yes, it would be the most efficient, but in this case, the chunk size depends on applied voltage, so the write-block-size property is a maximum chunk size (we trust that the user read the manual).

Fun fact:
While configuring number of flash wait states (3.4.1 Relation between CPU clock frequency and Flash memory read time), Zephyr silently assumes that the voltage is between 2.7V - 3.6V. I haven't checked it, but I suspect that Zephyr won't boot when low voltage is applied to the MCU.

@de-nordic
Copy link
Collaborator

Wouldn't it be most efficient for the driver to select the largest possible write size?

Max write size depends on MCU voltage.

Hm. Interesting, could not find the info in STM32F4 docs, but not a platform specialist here.

It's in the STM32F412 reference manual RM0402 Rev 6, 3.5.2 Program/erase parallelism.

Thanks, now I can see this. Very interesting.

You have two alignments here: source and destination; memcpy aligns source buffer, you could also use here UNALIGNED_GET

Is UNALIGNED_GET preferred over memcpy()?

Check what will give you better results. I suspect that UNALIGNED_GET will give you smaller footprint, smaller stack usage and quicker execution as it does not require doing all the function call/return stuff.

What I mean by changing write alignment is that it is probably most efficient, although I do not know the platform, to write by the highest possible chunk.

Yes, it would be the most efficient, but in this case, the chunk size depends on applied voltage, so the write-block-size property is a maximum chunk size (we trust that the user read the manual).

Nice to know I am not the only one.

Fun fact: While configuring number of flash wait states (3.4.1 Relation between CPU clock frequency and Flash memory read time), Zephyr silently assumes that the voltage is between 2.7V - 3.6V. I haven't checked it, but I suspect that Zephyr won't boot when low voltage is applied to the MCU.

I guess user has to check that before sending board to the field.

de-nordic
de-nordic previously approved these changes Jan 23, 2024
Copy link
Collaborator

@de-nordic de-nordic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I have left comment on possible usage of `UNALIGNED_GET, but that is up to the committer to validate.

@de-nordic
Copy link
Collaborator

You probably want to update copyrights.

@duda-patryk
Copy link
Collaborator Author

LGTM. I have left comment on possible usage of `UNALIGNED_GET, but that is up to the committer to validate.

Clang compiles it to the same instructions. It's just a decision between

value = UNALIGNED_GET((flash_prg_t *)data + i);

and

memcpy(&value, (const uint8_t *)data + i * sizeof(flash_prg_t), sizeof(flash_prg_t));

I must admit that UNALIGNED_GET looks nice, but let the code owner decide 😄

You probably want to update copyrights.

Google is already in copyrights. Should I change the year?

@de-nordic
Copy link
Collaborator

I must admit that UNALIGNED_GET looks nice, but let the code owner decide 😄

ST people decide then.

You probably want to update copyrights.

Google is already in copyrights. Should I change the year?

Yes, date but that is up to your company policy.

@erwango
Copy link
Member

erwango commented Jan 25, 2024

ST people decide then.

UNALIGNED_GET is already in use in some others STM32 flash drivers, and personally I like it, so indeed, would be a good move.

The implementation uses the same approach as STM32F1x.

Program/erase speed can be set by setting 'write-block-size' flash
property to 1, 2, 4 or 8.

Signed-off-by: Patryk Duda <patrykd@google.com>
@duda-patryk
Copy link
Collaborator Author

Changed memcpy() to UNALIGNED_GET().

Copy link
Member

@erwango erwango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks.
Though I have a concern on the documentation of this configurability and how one can guess this is actually a tunable parameter when reading the binding.

I'm not blocking since I can see that this is already the case on STM32F1.

Option I see is to add per series variants of st,stm32-nv-flash with new write-block-size prop defined as an enum with proper description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants