Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Clock Persistant. #595

Merged
merged 7 commits into from
Sep 18, 2021
Merged

Conversation

geekbozu
Copy link
Member

This is the start of a PR for adding some persistant storage in ram for things like the time. Step Count. Other items that would benefit from being kept through a soft reboot.
We accomplish this by declaring a .noinit section in ram which will not be cleared during initialization. We can then assign Global Variables to that section preventing them from being cleared during softboot.
We use a magic variable to determine if ram has been reset. We check to see if it matches the expected value and restore the system variables if it matches.
If it does not match we need to "backup" the current variables and then set the magic word. For now we just set the word and let the system init.

Currently only time is implemented.

Currently, with no chance of not requiring it, There is a bootloader modification to be done. Once we determine the size of this Persistent storage space and the changes to the linker script are solidified we can adjust the linker script for the bootloader and proceed to get that updated.

Also ran CLang format on the files...hope it did it right :)

@hubmartin
Copy link
Contributor

Works perfectly, thanks.
(Now I have no idea if my pinetime restarted :) Which is bad sometimes during develop :) )

Some ideas:

  • add __attribute__((section(".noinit"))); to some #define macro, for example __noinit
  • MAGIC_RAM - its a variable, so I would use clasical CamelCase. When I see this I expected it was a C define. Maybe add nonit to the name to better understand its meaning. like NoinitMagicVariable
  • 0xDEADBEEF is used few times, I would suggest a static constexpr for it with name for example NoinitMagicValue
  • Since soon we will have many global variables like BackUpTime, I guess it should have some common prefix, maybe also containing noinit. This way people would better understand by look of them.
  • In the place you set MAGIC_RAM = 0xDEADBEEF; shouldnt there also be clearing noninit area to zeroes? Because if you reset immediatelly, you get MAGIC_RAM but the data will be anything. It is better to have it in defined state.

@geekbozu
Copy link
Member Author

Thanks @hubmartin, glad to hear its working :D this PR is definitely still early in its implementation. I just wanted to get it listed and start gathering feedback on what it should encompass!

All great ideas I'm implement them in some fashion. I also want to move backing up the time outside of the DateTime class. It should not need to know about it. I think the system task is a better place for it...or somewhere not there :P

@hubmartin
Copy link
Contributor

After some more discussion in the chat, we came to the question - how do you findd out if your firmware rebooted? If we test cutting-edge branches and watch reboots, then even when you look at them you don't know it happened. This might be an issue to test stability of FW. What I suggest is also a reset counter in noninit section. Not sure how to reset it or where to show it, but for example you cech it every day to know the firmware did not crashed. What do you think?

@Riksu9000
Copy link
Contributor

You can see the uptime in SystemInfo.

use better linker constant names
Allow settings to stop naggin me that it has added assosiation types
@geekbozu
Copy link
Member Author

Ok a few new changes up for review
Most notably, We now clear noinit section if NoInit_MagicWord is bad.
I also moved backing up to the systemTask. Seems like a much more appropriate place for this.

....I did not find anywhere I liked to #define a ___noinit thing...so it stays long for now. Not a huge deal as this memory must be explicitly called on in order to be used anyway.

Also added it to the MCUBOOT linker file and tested it...because you know it is not going to work well if the segment is not defined there ;)

@geekbozu
Copy link
Member Author

geekbozu commented Aug 19, 2021

A note for me for later, the bootloader should be modified to clear this NVM area upon any recovery action. Be that an OTA/User revert/ recovery firmware.
Or maybe use the shorthash of the firmware (git hash) as the magic number? Then it would involve less bootloader change opinions?

@hubmartin
Copy link
Contributor

If you put all the noninit variables to the structure which will be the only variable in noninit (or class in C++?) and set packed attribute, then if you keep just adding elements to the end of structure it will be ok.
However you cannot move or remove some previous structure item. then if you add new item to the structure it will have zeroes as default for the first time.
I woudn't use git hast for magic number, because if I just create new branch, all data will be cleared but they will be valid.

@geekbozu
Copy link
Member Author

Not a bad idea but does not solve my concerns per se.
I love the idea of using a struct at minimum to help keep the MagicWord in a singular place. Packed sounds nice to keep the space usage appropriate.

My concern is more with if a data format changes on a Sealed watch how can we be sure to update them, we would need to either change the MagicValue OR have the boot loader clear them.

The major concern I have is this example.
In a new release we changed the date time format to use a different variable type. Upon loading this new firmware, Since there was no hard reboot, the hash is valid. The initializer now loads the backup values into memory. Corrupting the DateTime object.
This is the issue I want to make sure we never encounter on a sealed "consumer" unit.

The solutions to this I can see are.

  • Modify boot loader to clear NOINIT on any recovery action, OTA/Restore/Recovery.
  • Version the NOINIT, to the commit hash or some version number.

I do not like the idea of YET another version number to store, document and maintain, where using a hash "just works" for the grand majority of use cases... Except in development where the hash does not change and you need it to clear, Or as above you do not want it to clear. However in development.... we can probably use a build flag for this, but I would want to code it that it wont stick on MCUBOOT images.

My 100% concern here is protecting sealed watches from ever assuming that there is valid data there when there may not be. As developers we can assume and take risks that must not be present in a release build. Hence the idea of moving to a hash. Its updated often enough and commonly enough that any build using it will almost guarentee a wipe without requiring boot loader changes (which I'm honestly afraid of outside the linker-script change this requires). It also really should only effect developers actively working with values here. These items, in my opinion, are meant to be transient backups. They are only valid on boot, only to be written to except at boot. As such they are not scratch ram and the program should be able to load sane defaults anyway in the event the memory is corrupt.

So Hash pros are,

  • Agressive clearing of cache - ensures any new OS variant will most likely have the data cleared preventing bad data from being loaded.
  • Developer Agnostic - since its based on build information it is self maintained and needs no developer intervention to work as intended
  • saves a 4bytes ... we already include the hash ;)

Cons:

  • developers using this area for storage may need to deal with some hurdles to prevent clearing of the memory.
  • All updates or firmware changes will cause the data to be reset. (technically a pro as well?)

I also pondered out the idea of using the actual version string. EG "1.3.0" or similar. My concerns here became we love to make dev OTAs for our community members, This puts a lot of burden on us to make sure we do not merge a PR that breaks with damaged data, since our dev build did not increment the version number. I feel that is to much risk.

I also pondered using a manual version number like 0xDEADXXXX, This has the same concerns as above, where it is on us as a developer to maintain that...

So balancing all of that out, the hash seems like a very clean automated way, with no boot loader code changes, to handle keeping this memory region in a known state.

@ghost
Copy link

ghost commented Aug 20, 2021

Not a bad idea but does not solve my concerns per se.
I love the idea of using a struct at minimum to help keep the MagicWord in a singular place. Packed sounds nice to keep the space usage appropriate.

My concern is more with if a data format changes on a Sealed watch how can we be sure to update them, we would need to either change the MagicValue OR have the boot loader clear them.

The major concern I have is this example.
In a new release we changed the date time format to use a different variable type. Upon loading this new firmware, Since there was no hard reboot, the hash is valid. The initializer now loads the backup values into memory. Corrupting the DateTime object.
This is the issue I want to make sure we never encounter on a sealed "consumer" unit.

The solutions to this I can see are.

  • Modify boot loader to clear NOINIT on any recovery action, OTA/Restore/Recovery.
  • Version the NOINIT, to the commit hash or some version number.

I do not like the idea of YET another version number to store, document and maintain, where using a hash "just works" for the grand majority of use cases... Except in development where the hash does not change and you need it to clear, Or as above you do not want it to clear. However in development.... we can probably use a build flag for this, but I would want to code it that it wont stick on MCUBOOT images.

My 100% concern here is protecting sealed watches from ever assuming that there is valid data there when there may not be. As developers we can assume and take risks that must not be present in a release build. Hence the idea of moving to a hash. Its updated often enough and commonly enough that any build using it will almost guarentee a wipe without requiring boot loader changes (which I'm honestly afraid of outside the linker-script change this requires). It also really should only effect developers actively working with values here. These items, in my opinion, are meant to be transient backups. They are only valid on boot, only to be written to except at boot. As such they are not scratch ram and the program should be able to load sane defaults anyway in the event the memory is corrupt.

So Hash pros are,

  • Agressive clearing of cache - ensures any new OS variant will most likely have the data cleared preventing bad data from being loaded.
  • Developer Agnostic - since its based on build information it is self maintained and needs no developer intervention to work as intended
  • saves a 4bytes ... we already include the hash ;)

Cons:

  • developers using this area for storage may need to deal with some hurdles to prevent clearing of the memory.
  • All updates or firmware changes will cause the data to be reset. (technically a pro as well?)

I also pondered out the idea of using the actual version string. EG "1.3.0" or similar. My concerns here became we love to make dev OTAs for our community members, This puts a lot of burden on us to make sure we do not merge a PR that breaks with damaged data, since our dev build did not increment the version number. I feel that is to much risk.

I also pondered using a manual version number like 0xDEADXXXX, This has the same concerns as above, where it is on us as a developer to maintain that...

So balancing all of that out, the hash seems like a very clean automated way, with no boot loader code changes, to handle keeping this memory region in a known state.

I think this seems very sensible. I'm not sure the concern about data resetting is that valid for a developer device. I don't imagine you will be persisting lots of data to this memory area

@Avamander
Copy link
Collaborator

Corrupting the DateTime object. This is the issue I want to make sure we never encounter on a sealed "consumer" unit.

It won't have any drastic consequences though and can always be fixed by a companion. Good if avoided, but a minor thing nonetheless.

@Riksu9000
Copy link
Contributor

If we wanted to clear the noinit each time a DFU is flashed, we don't need a hash, we can just clear it when a DFU is flashed, and maybe also when a previous firmware version is rolled back to.

But maybe we don't want it to clear because we should eventually expect that the firmware doesn't require resetting ever and the only time a user will reset it is for a firmware update. If we clear the noinit in this case anyway, the noinit section is now doing nothing for users.

@geekbozu
Copy link
Member Author

geekbozu commented Aug 20, 2021

Corrupting the DateTime object. This is the issue I want to make sure we never encounter on a sealed "consumer" unit.

It won't have any drastic consequences though and can always be fixed by a companion. Good if avoided, but a minor thing nonetheless.

This was just an example, since the main firmware should be assuming that these values are correct upon load and not need to handle error checking....the date time class is not super likely to fail however I agree but not knowing what this is going to be extended to I feel like we should have some for of data validation. :)

If we wanted to clear the noinit each time a DFU is flashed, we don't need a hash, we can just clear it when a DFU is flashed, and maybe also when a previous firmware version is rolled back to.

But maybe we don't want it to clear because we should eventually expect that the firmware doesn't require resetting ever and the only time a user will reset it is for a firmware update. If we clear the noinit in this case anyway, the noinit section is now doing nothing for users.

I thought about that, I could not find a sane way other then forcing us to maintain yet another version string to allow it to persist through updates "risk" free. I just have the heavy fear that during development, and while the firmware is being constantly updated this can cause issues and subtle bugs that are easily avoided. I am also personally afraid to touch the boot loader more then necessary. As it is asking everyone to update is just barely worth it IMO. Right now the only changes just moves the stack...relatively inconsequential compared to writing more code!
I really would like this to become "set" so we can have it persist through updates and other actions, give that seamless it just works feeling.

I think this seems very sensible. I'm not sure the concern about data resetting is that valid for a developer device. I don't imagine you will be persisting lots of data to this memory area

Yeah I'm not at all concerned about developer units This is 100% for the consumer units to feel more polished. Its almost really just something to help hide crashes and panics by faking that its working properly after a restart ;)

All of this said I would love to come up with something that allows it to persist updates when appropriate. I just have yet to logic out a sane way todo so. No need to introduce bugs and risk due to poor variable initialization which is the ultimate fear :)

@hubmartin
Copy link
Contributor

Since I do not have any better solution, I would stick with with one of these options for magic word:

  • Use git hash - 100% safe, different firmware version will reset noninit.
  • Use version string like 1.3.0. But extract the 3 letters from the string and use that as magic word. This could have issues when some number becomes two digit in the string, but only one can because word has 4 letters space :) Maybe we should make variables MAJOR, MINOR, FIX as a separate numbers in CMake and use binary representation of them instead of string. (we use 3 bytes and every number could have range 0-255) This way we can clear noninit only when new release is released.

The complete clearing logic should be in the app, since many people (and me :)) do not have bootloader at all.

My feeling is that this PR has to mainly deal with random/forced reboots, which it perfectly does.

Now dynamically allocates noinit area size
stores it after the bss before the heap and stack.
@geekbozu
Copy link
Member Author

Made a change to not need bootloader changes. Little "riskier" but with the bootloader likely not changing much it should work out fine.

@geekbozu
Copy link
Member Author

geekbozu commented Sep 2, 2021

Added some comments for any developers planning to use this segment in the future.
Ultimately I voted against going crazier on the magic value, people using this segment should know what they are getting into and account for it accordingly. The notes explain to increment the version. It will be fine.

@JF002 this is ready to merge, No changes to the boot-loader segment are needed, The only thing left to add would be an offset to account for the bootloader I guess? If we need that I will add it.

@JF002 JF002 added this to the 1.5.0 milestone Sep 12, 2021
@JF002
Copy link
Collaborator

JF002 commented Sep 18, 2021

Thanks for this PR @geekbozu !
Thanks everyone for the discussions about the version of the format of that noinit area :)
I'm not sure we'll ever need to change the format of this area and/or add fields. I cannot think of any other value we would want to persist that we cannot store in the external flash memory.

@JF002 JF002 merged commit f556003 into InfiniTimeOrg:develop Sep 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants