Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Introduce explicit_erase capability in the Flash API to support memory devices that may not require erase #67687

Closed

Conversation

de-nordic
Copy link
Collaborator

@de-nordic de-nordic commented Jan 16, 2024

OK, so after battling with the no-erase idea I have finally settled to bring to this PR something I wanted to do with the ZSAI: I have brought in the capability structure and named first device characteristic, which is explicit_erase; this characteristic is set to true for all the Flash devices that require erase to allow random value write at previously written location.

There are now two Kconfig brought in FLASH_HAS_EXPLICIT_ERASE and FLASH_HAS_NO_EXPLICIT_ERASE; the existence of both may seem, superficially, redundant but actually serves purpose of allowing deciding, in code, whether execution path will be compiled on, depending on existence of devices that provide such capability.

I have removed the awful FLASH_EVAL_ERASE_REQUIREMENT macro, as it was very ambiguous and ugly thing that I could not even correctly name, according what it was doing.
Unfortunately I had to add one syscall flash_flatten and added one for convenience, flash_fill.
The flash_flatten syscall will either erase device, by calling Flash API erase callback provided by a driver, or will use flash_fill to fill a device, if driver does not provide erase callback, at specified offset and range, with erase_value - yes, all devices have to provide erase_value, at least for now; erase callback does not appear for devices that lack the capability, although devices that have hardware support can provide one, as a devoce may have more efficient way to fill device with erase_value, or the device actually provides hardware erase with optional use.
flash_flatten does not depend, at run-time, on explicit_erase capability of a device, but it does use the mentioned Kconfigs to cut out some code if not needed.
So how does the flash_flatten cooperates with the explicit_erase? Well, that is on user or subsystem maintainer to correctly use flash_erase or flash_flatten with aid of the Kconfigs and the explicit_erase capability.
Possible usage are as follows:

  • Code that will completely not work with non-explicit_erase devices: the code should react to lack of enabled devices with FLASH_HAS_EXPLICIT_ERASE capability at compile time; the code may need to also check in run-time for explicit_erase == true to error out, for example at init, to indicate to the user that this solution refuses to work with current configuration;
  • Code that will completely not work with program-erase devices: same as prior point but applied conditions check for opposite conditions;
  • Code that has different execution paths for program-erase and other devices: in this code user may use the Kconfig options to include different paths of execution depending on existence of devices, but if a code support multiple instance it also has to check the explicit_erase capability before selecting one of the paths; in this case user will probably use flash_erase in the program-erase path, with full understanding hot the code works and is impacted by the call; example of such code is support for Disk Access Flash driver, that may work on devices that do not require erase without extra steps.
  • Code has the same path for explicit_erase and non-program-erase paths: such code has probably been designed around program-erase capability of a device anyway, or is actually using flash_erase to scramble/remove data from device; n this case user will probably not use Kconfigs for anything, but should use flash_flatten instead flash_erase to make sure a device is erased or filled with erase_value to scramble data; example of such behaviour is MCUmgr command for erasing data from device.

It is worth to note that there is great change that the first point probably describes something that should be solved as described in point three, but the second point may rely on uniqueness of non-program-erase device, like random write at any point, that may not be solved in any other way described.

In this PR I have also provided flash_area_flatten function that does similar jobs as flash_flatten, but works with areas instead.

So in short what I have modified here is:

  • provided Kconfigs: FLASH_HAS_EXPLICIT_ERASE and FLASH_HAS_NO_EXPLICIT_ERASE;
  • provided device capability flag explicit_erase;
  • provided flash_fill and flash_flatten callbacks with tests;
  • provided flash_area_flatten with tests;
  • modified Flash Simulator to simulate non-program-erase devices and added tests for that;
  • modified Flash Sample to use the Kconfigs;
  • modified all drivers by adding explicit_erase = true capability to flash_parameters and set FLASH_HAS_EXPLICIT_ERASE in their Kconfigs, with exception to Amibq, Nordic MRAM and Nordic FRAM drivers, where I have used the FLASH_HAS_NO_ERASE and explicit_erase = false;
  • I have modified FCB and NVS drivers by use of flash_*_flatten instead of erase to make them work in non-program-erase devices;
  • modified Disk Flash driver with addition on code that will not execute erase on devices that do not require it;
  • replaced all other instance of flash_erase/flash_area_erase with flash_flatten/flash_area_flatten where needed.

As far as I remember now there is so far only one subsystem that is completely independent from erase and that is Disk Driver for flash, which erases because the device requires that rather than the software implementation of the driver.

Reason for naming the explicit_erase characteristic: while reading through various documents, including vendor papers on their Flash devices, this includes NXP, Infineon, ST, etc, I have found that there are many mentions regarding evaluation or modeling (or other smart words) of endurance of the program/erase process, also called P/E, so I found it quite descriptive and quite recognizable way to define a type of device.
Adding device capability also enables us to add, in the future, capability like write_block_ecc, uniform_page_layout, worm, erase_base_bit (actually there is only one bit needed to indicate erase_value, and so on, though question remains whether we will be able to use them in run-time.

Gathering QA:
QA no-erase-flash.pdf

Comment on lines 26 to 67
config FLASH_DEVICE_REQUIRES_ERASE
bool
help
Device requires erase before write.

config FLASH_DEVICE_HAS_ERASE
bool
default y if FLASH_DEVICE_REQUIRES_ERASE
help
Device has erase function. It does not mean that erase is required
by a device, it just means that it provides erase functionality.

config FLASH_DEVICE_HAS_NO_ERASE
bool
help
Device does not require erase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this support multiple flash devices in the same board having different behaviors?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to indicate what devices are on the board, the support depends on paths taken in code and checking run-time characteristic of a device called program-erase

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with a config to help with codesize and such. But I think we need names that make that clearer. Something that suggests a general capability, not that sounds like it applies to a specific device. Drivers that need explicit erase, when it isn't defined shouldn't be configurable.

Copy link
Collaborator

@bjarki-andreasen bjarki-andreasen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with adding the information that a "flash" device does not need to be erased before write, but I see no reason to add Kconfig options and use macros to get this information from the devicetree, we should just reuse (or optimistically redesign) the flash_get_parameters API with the new information

@@ -38,6 +38,26 @@ struct flash_pages_layout {
};
#endif /* CONFIG_FLASH_PAGE_LAYOUT */


#define FLASH_DEV_INFOWORD(dev) (((struct flash_driver_api *)dev->api)->infoword)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an inline function, and should copy the infoword to be compatible with userspace

__syscall int flash_get_infoword(const struct device *dev, struct *infoword);
...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kconfig options are provided to optimize how decisions in code are taken. For example if you only have a device that requires erase, then you will only have a device that selects FLASH_DEVICE_REQUIRES_ERASE.
In such case the:

FLASH_ERASE_REQUIRED(dev)

evaluates always to true and this optimizes code, as now the code know that there is not device that does not require erase and will not check parameters.
In most cases users will have only one kind of device, in which case run-time checking of these is waste of clock and flash.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering we need to read the write_block_size and often erase_value anyway, if we have one structure with device info, the potential ROM saving from this compile time variable is not worth it IMO

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you find out that you only address devices that do not require erase, you are not going to need above parameters.
I have also planned to move these into infoword, as I did in ZSAI, but a this point I have decided to reduce approach to have some success at least in making this go further.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syscall takes a lot of time, which makes picking infoword every time you need to check what you are working with, inconvenient. Also you are able to figure a lot at compile time, which means that you can already save clock and flash before run-time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I don't understand why we are using the flash API, if no_erase means that we don't need to erase, we don't need to deal with write alignment, and we are not dealing with pages, why not use the retention API?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I don't understand why we are using the flash API, if no_erase means that we don't need to erase, we don't need to deal with write alignment, and we are not dealing with pages, why not use the retention API?

Ah sorry, the write-block-size remains.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syscall takes a lot of time, which makes picking infoword every time you need to check what you are working with, inconvenient. Also you are able to figure a lot at compile time, which means that you can already save clock and flash before run-time.

Accessing device info should always be done using a syscall. This is a very good reason not to add this information at the device level but instead at a higher level based upon the device type.

#define FLASH_ERASE_REQUIRED(dev) (FLASH_DEV_INFOWORD(dev).erase_required == 1)
#endif

#if IS_ENABLED(CONFIG_FLASH_DEVICE_HAS_ERASE) && IS_ENABLED(CONFIG_FLASH_DEVICE_HAS_NO_ERASE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these macros? Just set the flag in the infoword and read it directly, its simple and specific to each device

Copy link
Collaborator Author

@de-nordic de-nordic Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example, that is for ZSAI but the macros are basically taken from there, https://github.com/zephyrproject-rtos/zephyr/blob/2ad4cc60e6744c71b0fdbe923e4f7b3e2735d3e3/samples/storage/zsai/src/main.c
that shows how the macros are in use when code may be accessing both types of devices.

@@ -128,6 +148,11 @@ typedef int (*flash_api_read_jedec_id)(const struct device *dev, uint8_t *id);
typedef int (*flash_api_ex_op)(const struct device *dev, uint16_t code,
const uintptr_t in, void *out);

struct flash_api_infoword {
Copy link
Collaborator

@bjarki-andreasen bjarki-andreasen Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with adding this infoword, but if we are extending the API, we should add all required information here, like write_block_size and erase_value, then simply replace the flash_get_parameters() to intentionally break all existing drivers, since they will all need to now signal that they support and require erase.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing infoword by callback may be OK, but will require, probably, caching by user as calls are more expensive than direct memory access (i.e. checking bits in infoword).
That is why macros work better here, because they basically access the bits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Macros are just less scalable since a single subsystem will likely interact with different "flash" devices with different properties, its adding many pathways to the code, compared to reading the info structure

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contracts for embeded devices are scaled by bucks you save for not paying for extra flash/storage, because this will be biggest area on die.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All products I have worked with have both an internal and external flash, these are pretty much always different in either write size, and pretty much always erase page layout. I see the scenario where there is no "flash" device which requires erase being an outlier

@Laczen
Copy link
Collaborator

Laczen commented Jan 23, 2024

I was unable to react during the Architecture Meeting so I will add my comments here:

I'm OK with adding a flag for devices that allow being overwritten, the flag would better be named allow_overwrite.

I don't think support for has-erase should be added. All flash devices should provide a erase function otherwise there would be no meaning of the "erase_value" parameter. This is of course the result of combining multiple device types under one type: all of them will need the combined properties. It would not be acceptable that it is needed to check a flag before a parameter can be used.

Be carefull when trying to optimize the code size for the driver, if this means that extra code needs to be added at the user level the end result will be negative. When comparing always use the user code + driver code and not just driver code.

My opinion remains that the combination of several device types should be done at a higher level (flash_map interface) as this would allow extending to disks, files, ... without forcing device types to a unified api and device structure.

@bjarki-andreasen
Copy link
Collaborator

bjarki-andreasen commented Jan 23, 2024

@Laczen @de-nordic

What is you opinion on evolving the flash API into something more flexible, in the following specific steps:

  1. Update the flash_parameter structure and API to provide more information (like requires_erase, supports_erase, write_block_size)
  2. Update in-tree flash drivers to configure the new flash_parameter structure (requires_erase = true, supports_erase = true)
  3. Update in-tree modules to use new flash_parameter structure

At this point, all modules and flash drivers behave exactly as they did before, there is just some new flags that are not used yet.

  1. New drivers which allow overwrite can then set the flags in their new flash_parameter structure
  2. Update modules to check flag and skip erase

At this point, the API supports essentially all memory types, and we can rename the API to something better. (we could even make the retained memory drivers implement the API at this point)

  • requires_erase = true -> flash
  • requires_erase = false and write_block_size > 1 -> ReRAM or what the newer technologies are called
  • requires_erase = false and write_block_size == 1 -> RAM, EEPROM

Its small changes to the existing API, the changes to in-tree flash drivers are nits, and we end up with a more flexible API. Flash devices will still be in the flash folder, implementing the same functions (but the newer flash_parameter structure).

We can achieve something similar with a higher level API and wrappers, but the mutated flash API seems like what that higher level API would look like anyway, given flash is the memory type with the most number of quirks :) That's what I gathered from reviewing the ZSAI PR at least.

@Laczen
Copy link
Collaborator

Laczen commented Jan 23, 2024

@bjarki-trackunit,
I agree that all can be done with small changes and I see no problem with making the changes, but do we end up where we want to be? E.g. do we want to question a device API to take the correct action (making 2 syscalls instead of one)?

Why are we adding these changes, there are new devices that do not require erase (but can) and have a expected life that far extends flash but it is not allowed to do an erase to make it compatible with the existing api -> we must be in serious lifetime problems with the present flash devices.

@bjarki-andreasen
Copy link
Collaborator

@Laczen

E.g. do we want to question a device API to take the correct action (making 2 syscalls instead of one)?

We already do, we need the write_block_size before write, and flash pages before erase. Besides, the application can cache the parameters before an operation like copying an image, so it would be many read/erase/write pr call to get params :)

@Laczen
Copy link
Collaborator

Laczen commented Jan 24, 2024

@de-nordic It is unclear how this PR will fit into the zephyr storage usage. Could you show what the end goal is: how would users that need storage interact with zephyr provided solutions.

@Laczen
Copy link
Collaborator

Laczen commented Jan 25, 2024

@de-nordic, I would like to advance on this subject. During the discussion about ZSAI it was clear that trying to put all storage devices under one driver would not be generally accepted.

I would propose the following: the flash driver would support devices that are not flash devices but are made compatible with a flash device by the definition of a erase-block layout (this can be 1 block for the entire device) and the implementation of a erase function.

To allow a higher level library to use the device at its best capabilities we add a properties flag to the flash_parameters:

  1. Flag that AND writes are allowed (rewriting the address with only changes from 1 to 0),
  2. Flag that the erase-block layout is "emulated", meaning a higher library can use a alternative block definition,
  3. Flag that the erase-function is "emulated", meaning that it does a write with the erase-value,

More flags could be added in a later stage.

This would not result in the smallest possible device definition, but it would allow higher level libraries to avoid unnecessary erases while not requiring any change to existing libraries.

@de-nordic
Copy link
Collaborator Author

@bjarki-trackunit

We can achieve something similar with a higher level API and wrappers, but the mutated flash API seems like what that higher level API would look like anyway, given flash is the memory type with the most number of quirks :) That's what I gathered from reviewing the ZSAI PR at least.

Yes, that is what I am trying to communicate all the time. In the end, if we even do some higher level wrapper, we would have to unwrap the information. Instead of wrapping: this is Flash, this is EEPROM, this is SRAM, we can just wrap the characteristics of devices: this requires erase, this does not, and this support (there are devices that write faster when you pre-erase).

What is you opinion on evolving the flash API into something more flexible, in the following specific steps:

Update the flash_parameter structure and API to provide more information (like requires_erase, supports_erase, >write_block_size)
Update in-tree flash drivers to configure the new flash_parameter structure (requires_erase = true, supports_erase = true)
Update in-tree modules to use new flash_parameter structure

Can do that, I am against usage of "overwrite", because most NOR flash devices allow "overwrite" when you just want to swap more 1 bits to 0. They do not do that when ECC is involved and it depends on structure of memory, because multiple writes to small area may hammer some other, independent, bit to change value.

#64732

  • requires_erase = true -> flash

  • requires_erase = false and write_block_size > 1 -> ReRAM or what the newer technologies are called

  • requires_erase = false and write_block_size == 1 -> RAM, EEPROM

This simplified association may not work, because, if you look at the table in #64732, you will notice that some EEPROM device have actually write-block-size != 1, but they emulate such by re-writing all the other bytes of the block. On the other side I have met ReRAM and MRAM devices that have and have not write alignment, and may or may not have ECC.
There is also EEPROM device (25AA512) that does not require but supports erase - works faster if you pre-erase write area.

That is also why it would not be possible to gather this info for higher level than do it at driver; if you do so, then in the end you have to then work by the characteristic rather than technology that provides it.

Additional thing, which we do not address now, is that the characteristics may change dynamically; for example here #67919 (comment) contributor points out that write-block-size depends of voltage applied to MCU.

That is why it is better to identify devices by features/limitations than by technology.

Every additional wrapper will be additional code to wrap and unwrap the info.

@bjarki-andreasen
Copy link
Collaborator

bjarki-andreasen commented Jan 27, 2024

@de-nordic

I agree entirely with your last comment, the association with property and memory type is just to reflect that they can be supported, not that the combination of properties should be used to infer the memory type. We should not need to care about that :)

To expand a bit on this, this would also mean that compile time macros would not be possible right? MCUBoot and other subsystems currently get the write block size directly from the devicetree, instead of using the API

Copy link
Collaborator

@Laczen Laczen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed change seems to be based on the assumption that the function of flash erase is just to prepare flash for writing and as such could be avoided for devices that do not need an erase.

This assumption is wrong, the function of flash erase is twofold:

  1. Prepare an area for writing,
  2. Put the area in a known state (the erase-value),

All subsystem that are using flash are using the above two conditions.

It would be possible for a subsystem to utilize the knowledge of the absence of the need to prepare the flash for writing but it will always need a method to put the area in a known state. The solution for devices that do not need an erase could simply be to provide a erase function that sets an area to a known state and to provide a flag that erase could be avoided when the subsystem does not need it.

This way all present subsystems can function without any change, but taking advantage of the device properties is still possible.

@de-nordic
Copy link
Collaborator Author

de-nordic commented Feb 6, 2024

The proposed change seems to be based on the assumption that the function of flash erase is just to prepare flash for writing and as such could be avoided for devices that do not need an erase.

This assumption is wrong, the function of flash erase is twofold:

1. Prepare an area for writing,

Not exactly true and depends on device. It actually prepares device for representing bits it is not able to change to from given state and on some devices it may aid in write speeds. On many devices you may keep trying to change N value bit to N multiple times without issues.

2. Put the area in a known state (the erase-value),

Software that assumes that will not work, that is true. But that is design consideration. For example some devices have no Bluetooth so running bluetooth stack on them may not actually work.
Your code may fit into some devices it will not fit into another - that is something you have to take into account during design.

All subsystem that are using flash are using the above two conditions.

That is actually not true.
Disk access used by FAT FS and EXT2, in Zephyr, does not care about known state, here known as erase_value, in any operation.
Stream flash uses erase_value only as a filler for the end of stream.

@@ -23,6 +23,16 @@ config FLASH_HAS_PAGE_LAYOUT
This option is enabled when the SoC flash driver supports
retrieving the layout of flash memory pages.

config FLASH_DEVICE_REQUIRES_ERASE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"none of the devices require erase"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be rather FLASH_ANY_FLASH_DEVICE_REQUIRES_ERASE or something like that, as a driver is expected to set that to indicate that such device, a device that requires erase prior to write, is compiled within system.

@carlescufi
Copy link
Member

carlescufi commented Feb 6, 2024

Architecture WG:

  • There is an agreement to add the "require_erase" flag to the flash API
  • @Laczen points out that the constructs used by the macros, such as if (FLASH_ERASE_REQUIRED(ctx->erase_required)) {
  • @de-nordic and @bjarki-trackunit counter that this is required in order to save flash, to ensure that no references to the function are stored at all
  • @Laczen mentions that the concept of pages is not useful anymore once you do not expose an erase operation

Copy link
Collaborator

@jfischer-no jfischer-no left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about the constant effort to hide every little thing behind Kconfig, and the ambiguous names of Kconfig options. flash_erase() should be improved to not bother the driver user with unnecessary erase-required checks.

Comment on lines +41 to +60
/*
* The default block size is used for devices not requiring erase.
* It defaults to 512 as this is most widely used sector size
* on storage devices.
*/
#define DEFAULT_BLOCK_SIZE 512

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be obtained from flash API (flash/other-memory-device driver), regardless of whether erasure is required or not. Simplifying/not-change the code below to just:

	rc = flash_get_page_info_by_offs(ctx->info.dev, ctx->offset, &page);
	if (rc != 0) {
		LOG_ERR("Error %d while getting page info", rc);
		return rc;
	}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this could use the sector-size from dts when used on devices that do not require an erase (can overwrite).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may have no such information as paging, for devices that do not require erase you actually may have entire layout support disabled.
"Bigger" operating systems also have to make assumptions for devices that do not seem to be block devices, for example FreeBSD uses system wide definition of S_BLKSIZE, defined in sys/stat.h, set to 512, and that is what for example tmp_fs uses.
Disk Access has been provided to support ELM FAT driver from the beginning, so it is historically provided to support file systems that have not been directly designed for embedded, so we can probably make the same assumption as big systems do.

Comment on lines 234 to 249
if (FLASH_ERASE_REQUIRED(ctx->erase_required)) {
if (flash_erase(ctx->info.dev, ctx->cached_addr, ctx->page_size) < 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, it does not look nice, it should just be doing flash_erase_required(ctx->erase_required) in flash_erase() implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the macro and instead provided a flash_flatten function that either does erase or fills area with erase_value, depending on whether a device provides erase callback.
Now users are required to take a decision on calling flash_erase or flash_flatten, depending on the design of solution they are providing.

Comment on lines 26 to 67
config FLASH_DEVICE_REQUIRES_ERASE
bool
help
Device requires erase before write.

config FLASH_DEVICE_NOT_REQUIRES_ERASE
bool
help
Device does not require erase.

Copy link
Collaborator

@jfischer-no jfischer-no Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kconfig are also used with FLASH_ERASE_REQUIRED(condition) macro
that allows to optimize out code that would never be executed

I suggest writing human-readable code and having unambiguous Kconfig options. Before a new Kconfig option is introduced, the author also needs to do some soul-searching and feel the pain of maintenance overhead and deprecation (That also applies to me, of course.).
Also not clear from commit message why FLASH_ERASE_REQUIRED(condition) needs to be exposed at all and cannot be handled by the drivers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not going to deprecate them in the near feature as we will have to live with both types of devices.
The Kconfigs have been renamed now and the FLASH_ERASE_REQUIRED has been removed.

@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2022 Nordic Semiconductor ASA
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow Commit Message Guidelines
tests: drivers: flash: Modify driver to run with no-erase devices

static uint8_t erase_value;

static void *flash_driver_setup(void)
{
int rc;
uint8_t erase_value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why driver user needs to know erase_value? Or what is the reason to introduce it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. this is test specific so that we could check whether we have change in buffer we used for reading.

@carlescufi
Copy link
Member

flash_erase() should be improved to not bother the driver user with unnecessary erase-required checks.

But the thing is that the caller needs to know if flash_erase() needs to be invoked or not, because its own algorithm will change based on that information. We cannot make this transparent to the caller that's the whole point

@Laczen
Copy link
Collaborator

Laczen commented Feb 6, 2024

All subsystem that are using flash are using the above two conditions.

That is actually not true. Disk access used by FAT FS and EXT2, in Zephyr, does not care about known state, here known as erase_value, in any operation. Stream flash uses erase_value only as a filler for the end of stream.

You are correct, the two systems that are using disk access do not care about the known state, but isn't the general idea that fatfs and ext2 on flash should be used only in very limited conditions ?

Having devices that do not require erase would enable the usage on fatfs and ext2 on these devices and I can only welcome that. BTW, also littlefs can be configured to use the disk access interface and with some finetuning (allowing smaller disk sector size) this would also be beneficial.

Regarding stream flash, this is only because it doesn't verify that the state is erased before writing, but really it should check this.

@de-nordic de-nordic marked this pull request as ready for review February 6, 2024 19:29
Test of file systems use flash_area_erase to erase device
to have a clear start for tests; switching to flash_area_flatten
allows them to do the same with devices that do not explicit
call to erase procedure before write.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
The commit replaces flash_area_erase with flash_area_flatten.
The function is used in to places:
 1) in image management commands IMG_MGMT_ID_UPLOAD
    and IMG_MGMT_ID_ERASE: to erase an image in secondary slot
    or to scramble trailer part of image, which could be misunderstood
    by MCUboot as valid image operation request;
 2) in command ZEPHYR_MGMT_GRP_BASIC_CMD_ERASE_STORAGE to
    erase/scramble data partition.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
The flash_area_erase is replaced with flash_area_flatten that
is also able to erase/scramble devices that do not require
explicit call to erase procedure before write.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
Use flash_area_flatten instead of flash_area_erase; this allows
to run tests on devices that do not require explicit erase
before write or do not provide the callback.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
The invocation of flash_area_erase, in boot_erase_img_bank,
has been replaced by flash_area_flatten.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
The commit replaces flash_area_erase with flash_area_flatten,
as it allows to emulate erase and scramble data stored
on devices that do not require explicit erase before write
or do not provide erase callback.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
The commit replaces flash_area_erase with flash_area_flatten,
as it allows to emulate erase and scramble data stored
on devices that do not provide erase callback.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
The flash_area_erase has been replaced with flash_area_flatten,
allowing code to work with devices that do not provide
erase callback.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
LittleFS is based on program-erase devices and require that
characteristics to work; the commit replaces flash_area_erase
with flash_area_flatten to allow LFS work on devices that
do not provide erase callback.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
Replace flash_area_erase with flash_area_flatten to allow FCB
Settings backend to work on devices that do not provide erase
callback.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
Replace usage of flash_area_erase with flash_area_flatten to
allow code to work with devices that do not provide erase
callback.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
The option should be selected by the driver.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
The device is based on ReRAM and does not require erase.
Erase has actually been emulated by write.
This commit removes the erase as device that does not support it
should not expose it.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
Shell will now work with devices that do not implement erase callback.

Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
Laczen added a commit to Laczen/zephyr that referenced this pull request Apr 2, 2024
flash_map is used by several subsystems to allow access to flash devices
in a unified way. Flash_map is by design directed and limited towards
flash devices and does not provide support for other non volatile
memories (e.g. eeprom).

nvmp (non volatile memory and partitioning) is proposed as an
alternative for flash_map that provides a path to solve many problems
that are caused by flash_map (e.g. zephyrproject-rtos#52395, the need to use flash_page
interface for block-size, the ability to override the flash page size
with a multiple, ...). The proposal is easily extended to other types
of memory that would use a dedicated driver solution (e.g. fram, mram,
rram that could have a driver similar to eeprom). It can also be
extended to disks and even files.

nvmp is an alternative for zephyrproject-rtos#67687 and zephyrproject-rtos#70828.

Signed-off-by: Laczen JMS <laczenjms@gmail.com>
Laczen added a commit to Laczen/zephyr that referenced this pull request Apr 2, 2024
flash_map is used by several subsystems to allow access to flash devices
in a unified way. Flash_map is by design directed and limited towards
flash devices and does not provide support for other non volatile
memories (e.g. eeprom).

nvmp (non volatile memory and partitioning) is proposed as an
alternative for flash_map that provides a path to solve many problems
that are caused by flash_map (e.g. zephyrproject-rtos#52395, the need to use flash_page
interface for block-size, the ability to override the flash page size
with a multiple, ...). The proposal is easily extended to other types
of memory that would use a dedicated driver solution (e.g. fram, mram,
rram that could have a driver similar to eeprom). It can also be
extended to disks and even files.

nvmp is an alternative for zephyrproject-rtos#67687 and zephyrproject-rtos#70828.

Signed-off-by: Laczen JMS <laczenjms@gmail.com>
.get_parameters = flash_ambiq_get_parameters,
#ifdef CONFIG_FLASH_PAGE_LAYOUT
.page_layout = flash_ambiq_pages_layout,
#endif
Copy link
Member

@fkokosinski fkokosinski Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aaronyegx, @RichardSWheatley - could you guys take a look at the changes in this PR in the context of the Ambiq flash driver? Thanks :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use this flash driver for Apollo4 MRAM and plan to add Apollo3 flash support which needs to erase before writing. It seems it will become incompatible after this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be different device, so for the Apollo3 you can compile in the Flash and for Appollo4 you can compile the MRAM. Do you have any device that have both and you want to cover it with a single driver?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed from the Ambiq driver but not the Andes driver.

That Kconfig still has the page layout but ours has it removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@de-nordic If we have added CONFIG_FLASH_HAS_EXPLICIT_ERASE to define if the device needs explicit erase or not, should we use this CONFIG_FLASH_HAS_EXPLICIT_ERASE to set the explicit_erase and include the flash_erase implementation instead of cleaning them up? As mentioned in my last comment, we will have Apollo3 flash support which need erase action soon and the erase driver (may only include some adaption) is still needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driver implementation is up to you.
The flag is used by software to determine whether it has to do erase as required by the device itself, or sometimes use it for removal of data.
Most software plays better with things that do not require erase.
I will pull the change here to separate PR so we can discuss your options separately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed from the Ambiq driver but not the Andes driver.

That Kconfig still has the page layout but ours has it removed?

That is why I need help from vendor representatives here. I do not have to remove these things, the question is how you want to approach it. You can still present MRAM/RRAM devices as Flash, they will just work less efficiently in software stack as it may still call erase even if not needed by device.
because the PR here prepares the subsystems to work with devices that have the erase requirement or not, change to any mram/rram can be done post this pr, in either direction, and subsystems will adapt.

@Laczen
Copy link
Collaborator

Laczen commented Apr 7, 2024

@de-nordic the proposed method using Kconfig will force drivers to be split up. Next to the Apollo example from @aaronyegx this will also happen on devices that are using the spi-hor driver to support mram/fram/... devices.
This can be avoided by defining a compatible zephyr,flash-no-erase that is added to the flash node compatibles. A driver that has this compatible could then simply remove the erase functionality.

@de-nordic
Copy link
Collaborator Author

@de-nordic the proposed method using Kconfig will force drivers to be split up. Next to the Apollo example from @aaronyegx this will also happen on devices that are using the spi-hor driver to support mram/fram/... devices. This can be avoided by defining a compatible zephyr,flash-no-erase that is added to the flash node compatibles. A driver that has this compatible could then simply remove the erase functionality.

Yes, they may have split up with common code shared among drivers, like ST does. Actually the same file may be used but it get ifdefed for the proper platform, which is not perfect either.
There was opposition to share the spi-nor driver for xram devices that is why we have evolved here. And actually it is good, because the spi-nor driver exactly shows what happens when generic driver starts to get specialized, and vendor specific features are added - a lot of ifdef cutting, not really something you can easily read through
The zephyr,flash-no-erase still does not allow us subsystem code optimizations, which I have explained several times already, because it is only 0-1 flag that states whether there is device that has the property, over the base feature set, but does not allow to figure out whether there are any without it.

Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Jun 17, 2024
@de-nordic
Copy link
Collaborator Author

Closing, the #71271 has been merged instead.

@de-nordic de-nordic closed this Jun 18, 2024
@de-nordic de-nordic deleted the no-erase-spi-nor-flash branch October 4, 2024 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Architecture Review Discussion in the Architecture WG required area: Flash DNM This PR should not be merged (Do Not Merge) Stale
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.