Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't allocate flash to SPIFFS that is reserved by the system. #3260

Merged
merged 2 commits into from
Sep 1, 2020

Conversation

pjsg
Copy link
Member

@pjsg pjsg commented Aug 29, 2020

Fixes #3208

  • This PR is for the dev branch rather than for master.
  • This PR is compliant with the other contributing guidelines as well (if not, please describe why).
  • I have thoroughly tested my contribution.

This PR just subtracts the 20k reserved memory from the end of the flash before setting up the SPIFFS partition. This also explains why the contents of SPIFFS would get corrupted on the dev branch.

The downside of this change is that this will trigger a SPIFFS reformat (if you didn't have a fixed size spiffs). However, that is a good thing as it will no longer overwrite important stuff.

@pjsg pjsg requested a review from TerryE August 29, 2020 21:47
@marcelstoer marcelstoer added this to the Next release milestone Aug 30, 2020
@marcelstoer
Copy link
Member

Thanks Philip!

@TerryE
Copy link
Collaborator

TerryE commented Aug 30, 2020

My concern here is that this area hasn't been reserved or used by the system since SDK 2.x. Since SDK 3.0, the system used the areas defined in the PT. See user_main.c:L81-83. I did check at the time of testing this that SDK now uses the RF_CAL,PHY_DATA and SYSTEM_PARAMETER system partitions by erasing the old 5 pages and running a chip for a bit before checking again, but I might have missed something or some code path still uses the old locations. So this might be work validating.

However if I am right this fix won't fix anything. I've just checked Wemos D1 mini pro that I've been testing on for a few weeks using esptool --port /dev/ttyUSB0 --baud 460800 read_flash 0xfb000 0x05000 /tmp/x.bin and this area is still all 0xFFs.

So the bottom line is that I can't see how this change has anything to do with #3208, unless our code is still dicking around with the non PT locations. Sorry.

BTW w.e.f SDK 3.0, this area isn't reserved by the system so this title is misleading.

Copy link
Collaborator

@TerryE TerryE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not merge or restart using SDK 2.x features until we have bottomed this issue. Thanks.

@pjsg
Copy link
Member Author

pjsg commented Aug 30, 2020

@TerryE What was interesting is that when this happened, the old contents of the last 12k (I was only saving 12k) did include my SSID and password, and then, when it stopped working, the SSID and password was gone.

Also I I do wifi.sta.config({ssid="test ssid 1234", pwd="test pwd 1234"}) then those strings show up in the last 20k of flash.

Something is certainly writing to that area, and there is strong evidence for something reading from that area.

@NicolSpies
Copy link

@pjsg, as mentioned in #3148, I have been doing continuous ongoing testing related to the SPIFFS, I also started experiencing the problem of the failing WiFi connection. Both the failures (SPIFFS related corruption & disappearing files and WiFi connection) happened at least once every 24 hours.

I have implemented your suggested change, since then none of the long time present problems have been observed so far. I will report if it changes.

@TerryE
Copy link
Collaborator

TerryE commented Aug 31, 2020

@pjsg Philip We have 3 reserved system partitions: RF_CAL,PHY_DATA and SYSTEM_PARAMETER . We moved these to the "dead" region before the firmware, that is 0xB000 - 0xFFFF back when we did the 2.x to 3.0 upgrade; the pros and cons were debated then on the issue and PR. I accept that I made a mistake in not including WiFi connection tests in my testing at the time and so I missed the fact that the SDK / BIOS code for the WiFi fixed the SP are as the last 3 pages in the flash, so we need to accept and implement this move.

What I don't understand is why we should also move the other two back since we have validated that the SDK correctly uses the current versions.

My suggestion is that we only move the SP partition back. We also need to check the node.setpartitiontable() and nodemcu-partition.py code to make sure that these still work. We also need to check that this interplays well with the RFcal initialisation code ad the flash size reset.

I will prepare and push an update along these lines, so please post back here if you disagree and then we can open this debate for resolution, before doing the change.

@pjsg
Copy link
Member Author

pjsg commented Aug 31, 2020 via email

@galjonsfigur
Copy link
Member

I missed the fact that the SDK / BIOS code for the WiFi fixed the SP are as the last 3 pages in the flash, so we need to accept and implement this move.

It was very easy to miss, as Espressif stated in the first documentation of partition table that SYSTEM_PARAMETER partition has to be declared and can have custom address. But in "Notes" section of README they state that SYSTEM_PARAMETER partition has fixed address - very inconsistent.

@TerryE
Copy link
Collaborator

TerryE commented Sep 1, 2020

they probably screwed up the other areas as well. If those areas do seem to be mapped down low, then that is great

No, I don't think the other two are screwed up. These partitions are updated as per the PT, and there is no evidence that the corresponding high pages are touched. As @galjonsfigur points out, the documentation states: "SYSTEM_PARTITION_RF_CAL, SYSTEM_PARTITION_PHY_DATA and SYSTEM_PARTITION_SYSTEM_PARAMETER have to be defined in the partition table in application. Users can change their addresses according to practical needs." However, the README also states that this last must be at the "correct address" (the 3 three pages of flash)

Even if we do move the SP partition back to end of flash, we still need to keep the PT valid and my code assumes that (and also validates that) the individual partitions are in ascending address and are non-overlapping, so we need to resort and update the PT to avoid side-effects. I've made the changes and tested the load + node functions.

nodemcu-partition.py doesn't crash now. I am not really happy with this, but it is as good as it was. I think that it is better to get this patch in before the next drop. We can reconsider nodemcu-partition.py as a low priority.

@TerryE TerryE merged commit ba61110 into nodemcu:dev Sep 1, 2020
vsky279 pushed a commit to vsky279/nodemcu-firmware that referenced this pull request Nov 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants