-
Notifications
You must be signed in to change notification settings - Fork 3k
fix for the RWW errors while using Storage driver for internal flash on K64F #2150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
An unrelated observation: please note that the mbed coding style doesn't encourage the use of |
@mbed-bot: TEST HOST_OSES=windows,linux |
[Build 612] |
#elif defined (__CC_ARM) | ||
#define __RAMFUNC __attribute__ ((section(".data.ramfunc"), noinline)) | ||
#elif defined ( __ICCARM__ ) | ||
#define __RAMFUNC __ramfunc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you'll need these for most targets which support writing to internal flash you might consider moving this to a common location. The current method for handling toolchain specific attributes it to put them in toolchain.h.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved __RAMFUNC
to toolchain.h
@mbed-bot: TEST HOST_OSES=windows,linux |
[Build 615] |
* startup sequence. */ | ||
#ifndef __RAMFUNC | ||
#if defined(__GNUC__) || defined(__clang__) // GCC and llvm/clang | ||
#define __RAMFUNC __attribute__ ((section (".data#"), noinline)) /* The '#' following ".data" needs a bit of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a horrendous hack - the toolchain equivalent of SQL injection.
Does ".data.ramfunc", like you did later for ARMCC, really not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without this 'hack', there will be compiler warnings, as the comment indicates. Is that acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you exhausted the possibilities for a proper fix?
I recall from the last PR that ".data.ramfunc" complains because it expects ".data.*" sections to be non-executable.
Have you tried a completely new section name, like ".ramfuncs"? Would mean adding it to the linker map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
".data.ramfunc" also produces a compiler warning from gcc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, a completely separate name is the right thing to do for ARMCC too. You were previously worried about the issue of actually marking RAM sections executable. You do want some special naming to allow grouping together into an executable region if necessary.
Lumping this stuff in ".data.*" is just asking for it to end up in a non-executable region.
Code looks a lot clearer that the earlier attempt. Only remaining issue with the actual implementation is the "fool the compiler into accidentally outputting a comment symbol". My immediate response to seeing that is the desire to submit a GCC patch to stop you doing that. GCC's clearly at fault in not escaping or otherwise blocking the "#". Otherwise implementation is fine - it's doing what is intended. So +1 for implementation if that can be cleaned up. However, I do vote -1 overall, as I do not want any component in the system disabling interrupts for 5-115ms. It will break other components, and we're going to spend the next couple of years addressing arising from this breakage. I believe we should not be permitting and encouraging flash storage if block 1 is in use by the program in a reference design. And I think more engineers need input on this, as the submission of this patch potentially impacts every driver in the K64F system. |
@kjbracey-arm please review |
@mbed-bot: TEST HOST_OSES=windows,linux |
Regarding @kjbracey-arm 's other comment about disabling interrupts: this change will kick in the disabling of interrupts only for applications whose images are > 512KB in size. And in that case, it may be better for most applications to block interrupts briefly instead of suffering an arbitrary hard-fault. I argue that this change should be allowed to go through as a safety guard for the benefit of the majority of users who may not be affected by brief interrupt blackouts. For the rest, we need to put in place warnings in our documentation. |
#if defined(__GNUC__) || defined(__clang__) // GCC and llvm/clang | ||
#define __RAMFUNC __attribute__ ((section (".ramfunc"), noinline)) | ||
#elif defined (__CC_ARM) | ||
#define __RAMFUNC __attribute__ ((section(".ramfunc"), noinline)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are now identical, so no separate block required.
+1 for implementation now. Continued -1 for doing it at all. Ultimately, it needn't be a choice between disabling interrupts and an arbitrary hard fault. The toolchain could restrict image size to 512K at build time, or the storage driver could return a polite error if it detects that the image size has encroached into its storage zone. I agree that there is merit in enabling this for users that can cope with interrupt blackout. But I'm not happy with that happening silently and implicitly. I can already envisage the complaints: "when I enable debug, my serial driver starts dropping characters". Whoops, image size exceeded 512K. My feeling is that there should be some sort of explicit control to permit this behaviour. So that way the people who can't tolerate it have some way of avoiding accidentally triggering it. Conversely, they may possibly want to force the behaviour. They may want to test resilience against the interrupt loss when their image size doesn't currently exceed 512K, but they know it ultimately will. Although I guess it is sort of semi-explicit - they do have to manually set the base address at the moment. I guess with sufficient warnings in the documentation... And the necessary toolchain interlock to stop the image size crossing over that base address. |
[Build 616] |
I see no problem in disabling interrupts for any amount of time. If this breaks the application, then the application is too complex for the hardware and the user should upgrade to a different board anyway. |
@mbed-bot: TEST HOST_OSES=windows,linux |
@@ -225,6 +225,21 @@ | |||
#endif | |||
#endif | |||
|
|||
/* Define '__RAMFUNC' as an attribute to mark a function as residing in RAM. | |||
* Use of __RAMFUNC puts a function in the .data section--i.e. the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Micro-nit - ".data section" is now wrong. Exact details will depend on the toolchain/linker map. Better to just state that it will place the function into RAM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Kind of agree with that. The catch is that we have to acknowledge that all our reference applications are now too complex for our main reference board. |
@mbed-bot: TEST HOST_OSES=windows,linux |
If we can't fit our reference applications into 512 KiB of flash we got bigger problems! Once we get around to optimize for size, this shouldn't be an issue. If the user wants all the bells and whistles then they should be prepared to pay the price. If the user wants to use a complex API, like writing to flash, they must be prepared to handle the side-effects, like high latency during the flash operation, and I'm pretty sure that with proper documentation our users can handle it. |
[Build 617] |
We are aiming at 512K release for our biggest stuff, but then we need 1M for the debug images.
The problem is that the components that can't cope with the latency (like radio drivers and SLIP drivers) don't get any direct say in when flash flushes get called. And some drivers may not be able to cope at all - I don't know if we've established that the Wi-Fi module can ever recover if a flush happens. There should need to be some sort of serious OS mechanism here - some sort of "suspend" call going round to drivers to tell them to prepare for a brief system shutdown. And I know for a fact that all our current applications are calling flush after every write. They're yet to take any defensive measures to even limit the frequency of them, yet alone try to deal with the necessary work to co-operate with low level drivers. |
I think you just made my point. If our image is 1 MiB and it uses the flash for config storage, the K64F is not the right board to use because the application will hardfault everytime it writes to flash while executing from the second flash bank. Either get a different board or reduce the code size. The last place to put the blame is the flash driver. |
Indeed. Or add a module with separate storage. The fundamental problem here is the attempt to use a bare FRDM-K64F as a "1MB development platform with non-volatile storage". It clearly is not suitable for that. And I'd be happy leaving it at that - just put a check in the flash driver or build system to guarantee that the program image and flash area are not sharing a block, and politely refuse if they do. That would suit me, and we then can go find a better platform. By making this change, we are implicitly suggesting we might be trying to limp on with this unsuitable platform.
Kind of depends on whether you view this as an OS or not. Any OS will normally have basic expectations about expected interrupt latency. Typically most OSes expect something in the 100us-500us range. It's an implicit or explicit part of the API that drivers shouldn't do things to break that. Not blocking out other interrupt users is as expected as not scribbling over their RAM. I really don't think core drivers should not be doing this, at least not without a very specific permission slip from the application saying "I realise this may break the system, and am willing to accept responsibility." |
[Build 618] |
I wouldn't call this a change. It is a bug fix. Without it your application will hardfault. At least now it will continue to run. That might lead to timing sensitive code doing things it wasn't intended to do, but that is a bug as well that should be fixed.
I wouldn't expect anything from an OS unless it was explicitly stated in the features. So far mbed OS doesn't provide any latency guarantees. |
+1 to adopt these changes. The changes are extremely beneficial to a wide range of system designs and applications:
Not withstanding the valid points of @kjbracey-arm above including:
I disagree with his -1 assessment overall:
I suggest that the issues described by @kjbracey-arm are the point of departure for:
Please lets not delay these fixes further, which were first initiated by Rohit ~6days ago. |
Using the SD card on the K64F. We could make it easy for our users to use that to store configuration data.
Indeed! I got a demo to build! 🎉 |
Not going to argue with the bulk of Simon's view. I'm certainly not "-2"ing this. But to comment on two points:
I don't think "low" is a fair characterisation. It's normal interrupt latency in the hundreds of microsecond range that all conventional systems reach, and is required for typical hardware to work properly. That range has remained basically constant for about 30 years. For example, the K64F serial port buffers are sized on the assumption of sub-millisecond IRQ response, and will overrun if that's exceeded.
I'm not sure who immediately benefits. We're the only team I'm aware of that has hit this problem, because some images exceeded 512K, and we're not intending to activate this patch, because we don't want to take the IRQ latency hit. From our point of view the response to the original issue is "can't fix - hardware limitation". We'll ensure CFStore is only activated on <512K images until we can find an alternative storage location. |
I think low is a fair characterization. You can't expect interrupt latency in the hundreds of microsecond range if you are on a busy system handling interrupts from multiple sources. The K64F serial port buffers are sized on the assumption that you use DMA on high-latency systems.
My team need this for our client. |
@kjbracey-arm If you don't accept the consequences of this change, you can choose to disable your use of configuration-store against K64F-flash--it can still run out of RAM when needed. We can and will educate users regarding the consequences. There are no alternatives as long as we're using the K64F. I propose that we proceed with this so that others may benefit, and for the sake of bringing this issue to a resolution. We can take up the discussion about switching to other platforms elsewhere. |
I'm not going to stop you proceeding with this. You're free to merge. I'm only declaring Gerrit's "-1 - I'd rather you didn't", not a "-2 - you shan't". The patch itself is sound - no remaining objections to the implementation as it stands. (Although checks for image/store overlap should still be added later as an enhancement). But there are other options for K64F:
Those three are equivalent to the solutions being used by our other non-K64F deployed self-updating platforms, where we have experienced the issue of interrupt latency on flash erase and know it breaks our time-critical frequency-hopping networks. Having this option for K64F will make it less likely that people explore those better options, no matter how many caveats we attach, because this is the "cheap and dirty" and transparent-seeming one. We're going to be asked to try to make it work - that's why I'm against the option existing, rather than just being totally neutral on it. Once this goes in, I just have to start -1/-2 ing applications that try to rely on it, rather than improving their system configuration. :) |
@sg- Please review |
fixes https://github.com/ARMmbed/configuration-store/issues/21
Read-while-write errors may occur when there is a read/fetch on a bank of
internal flash while an erase/program operation to the same bank is active.
The possibility of read-while-write errors depends on the application's span
of internal-flash.
In the trivial (and also the default) case when this boundary is set to lie at
the junction between the two banks of internal flash (i.e. BLOCK1_START_ADDR),
there is no possibility of read-while-write. In the more common case, when
the system doesn't reserve all of BLOCK1 for the flash driver, there exists
the possibility of read-while-write.
To avoid this problem for the common case (above), we enforce the following:
execution of arbitrary user-code which might access flash.
out of RAM. Refer to __RAMFUNC which allows for this.
We assume that DEVICE_STORAGE_CONFIG_HARDWARE_MTD_START_ADDR is the boundary between application and this driver. This is used to make a compile-time choice to enforce the above-mentioned conditions.
@kjbracey-arm @simonqhughes @meriac
notify: @marcuschangarm