-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core 2.7.0/2.7.1] ESP.restart() or ESP.reset() cause bootloop #7307
Comments
Unfortunately your sketch is working as expected here (using a d1 mini running latest git master). Are the IOs connected to something or left unconnected ? |
I tried again with cpu@160MHz, flash@80MHz, with "DOUT" and also "QIO". |
When testing with my code, I noticed that I needed to flash a 160Mhz image from being booted within a 160Mhz image to cause it to die every time. At that point reboot and reflashes caused boot loops until power cycled. If you flashed an 80Mhz image while booted into a 160Mhz image, then it would die until power cycled, but could then be flashed again with either a 80Mhz or 160Mhz image and would be fine until another flash occurs with either a 80 or 160Mhz image. I note that I'm not resetting the unit via a USB cable - but flashing via the /update URL on the basic web updating framework. |
Hi, I tried full flash erase and the problem is gone, now it can restart correctly. I guess something in the flash causing trouble. Thanks for your suggestion. |
What option did you use to erase the flash? Does this use esptool.py or esptool? @rev1204 - Do you get the same successful result after doing an OTA update when set to 160Mhz? I can flash via the USB port many times and it'll load fine - until I do an OTA with the CPU at 160Mhz, or the flash at 80Mhz. At that point, I get the boot loop again. Doing an OTA update (via the http interface and /update URL) will always cause a boot loop when running at 160Mhz. EDIT: Logs after doing an OTA via http://$ip/update:
|
For more debug, I turned on: Then built & flashed a 160Mhz image to the D1 Mini, then uploaded a 160Mhz image via the http updater. Below is the log of the entire first boot, then http update via the /update URL:
I believe the problem lies above somewhere - as after doing this OTA update, every I have also confirmed this same problem occurs when keeping the CPU to 80Mhz, but changing the FLASH to 80Mhz... |
Update @CRCinAU : |
@rev1204 what is your flash brand? or flashID number. |
@devyte I think I get it now what causes the bootloop. It because of SPIFFS. My guess is because SPIFFS now has been deprecated. This is my code:
I build and upload using full flash erase. After that I use ESP Sketch Data Upload. It prints SPIFFS Mounted. After restart, ESP just bootloop. |
It's an XMC flash chip. edit: |
Looks like you guys are already zeroing in on it. Arduino/cores/esp8266/core_esp8266_main.cpp Line 338 in 4519db8
made things work as expected. At least with my short testing. |
@d-a-v |
@rev1204 Thanks for reporting! @CRCinAU Can you try with the proposed fix ? |
I can also verify that commenting out experimental::initFlashQuirks() does solve the problem. |
@d-a-v I saw the same issue yesterday, David. XMC chips on my D1 Mini modules, boot loop after ESP. restart() at 160Mhz compile. Dropping the flash freq. to 26Mhz fixed it for me. Edit: it worked for me as well on 2 different boards I tested. I'll switch the ESP.restart() out for ESP.reset() and retest. |
@CRCinAU I haven't seen confirmation that you have an xmc flash. I think it's likely, but let's be sure. Please check your flashId and report it here. |
@devyte - I'm guessing its the output of
|
That looks like a 1MB XMC flash, but I don't know where that trailing '8' comes from. It should be a 6-nybble or 3-byte word. |
I'm pretty sure its an XM25QH32B - http://www.xmcwh.com/index.php?s=/cms/170.html |
That leading '14' normally means a 1MB part. The '58' is the specific model or version, and the '20' would be XMC. You're linking a 4MB part, which usually has '16' for the leading part of the ID. |
@Tech-TX I also puzzled at that number a bit. I think 1458208 is decimal which would be 0x164020. |
Gotcha. I've never seen anyone show a flash ID in decimal. :-) Patching the file in mhightower's post should fix you up. |
The code is just the following as a String:
I'm guessing the auto-conversion stuff just dumped it as DEC instead of HEX... |
So everyone's in the same boat then, awesome. |
No, they weren't. Stock 40Mhz. But here's the weird thing... I had re-enabled i.e. It has just properly self reset, runs properly, self resets properly again, runs again...
Edit: btw, if it means anything, I just noticed after the power cycle the boot mode changed from (3,0) to (3,6). I'm not sure what the second digit means though? Code
|
Interestingly enough, if I set ie:
if I recompile with
Is QIO supposed to work on these flash chips? It seems going by the data sheet that its supported? Oh, and to remove all doubt, this is the output from the logging on my web server when an update is served to the D1 Mini via https... The Header md5 is the one supplied by the D1 Mini in the http header, the File is the calculated md5 of the firmware on disk. EDIT: I note this post that kind of asks the same question, but never answered: |
@CRCinAU I honestly don't know what happened there... I left it (Wemos D1 Mini) running some different code while opening an issue about something else, and just tried again, and now the 160Mhz compiled code crashes as expected, with the flash speed at either 40 or 80 Mhz. Otherwise I was going to suggest bumping your flash memory speed up as well as the CPU speed. The only consistent success I've had with the Wemos D1 Mini thus far has been to comment out that And I don't know why QIO worked that time with it, because it doesn't now either. 😕 Not knowing how the chip is wired, I'm don't even know if it should have worked or doesn't become of some other issue, since QIO mode needs two extra connections between the ESP8266 and the SPI flash chip which aren't always connected, and even if they are, often to the wrong pins, etc, etc.. |
It seems like I can use
|
I might be wrong here, but if I flash with esptool.py with the If I flash with If I'm interpreting this correctly, once you flash once as QIO, it'll always use QIO. If you flash as DIO, it'll always use DIO. Is this correct? |
One meal, and one preprocessor @CRCinAU I can replicate that behaviour also... A previously 'QIO' mode flashed device will not change to 'DIO' mode if flashed OTA. I immediately flashed the same code over serial, and it then switched. Reading some of what is mentioned in the documentation for esptool, it mentions
these being 'correct mode, frequency and size settings'. And then goes on to mention
This leads me to believe (as well as something else mentioned in relation to XMC in the linked issue above) that the bootloader is responsible for setting the flash frequency, and AFAIK OTA doesn't touch the bootloader. I also just saw this in a map of the ESP8266 memory layout:
|
The XMC chips support QIO, but some of the Wemos D1 boards they are on are not wired for QIO, depending on the version and if they are genuine or not. I found no guaranteed way to find out other than trying it (or getting out a multimeter and chip datasheets). I found the XMC chips to be generally unstable at 40 and 80 MHz on the D1 boards with random crashes etc. Adding SPIFFS to a sketch made it much worse - this is what initFlashQuirks() was supposed to solve by having the XMC chips boost their output drive from the default 75% to 100%. The XMC chips are supposed to be good to something like 105MHz from memory, but I'm assuming that requires 100% output drive. Also, as far as I can see, the bootloader does not set the flash speed, which is always 20MHz when the bootloader runs. The flash speed is most likely set early on in the SDK init. OTA updates DO update the bootloader (just tested it), however you will not see the updated bootloader on the boot that installs the update because it's not been installed yet... One thought - if ESP.reset() resets the chip in the same way as the reset button - clearing the instruction cache and forcing code to be re-read from flash whereas ESP.restart() merely jumps to the boot address leaving the cache intact, that might explain the difference in behaviour... |
Looking at the disassembly of _SPICommand() - the function initFlashQuirks() ultimately uses to do it's magic - it looks like the optimiser is messing with the code and ignoring the PRECACHE_ATTR directive to prevent re-ordering code blocks... which is bad because some of the critical code is moved outside the precached() area. It's also ignoring at least one volatile and optimising it away... if it's ignoring others bad things will happen here. Either of these things will cause a crash. Has anything changed in the optimisation settings generally? |
Nothing has changed in years as far as I know. You can use a |
@ChocolateFrogsNuts Thanks for the reminder about the first run after OTA updates, which I did try to account for by never taking the first reset result. I still don't think OTA Anyway, I just tried again, thinking I would see if
i.e. when the Wemod D1 Mini was previously running a 80Mhz CPU / 80Mhz Flash in DIO mode ...
... and is then flashed with 160Mhz CPU / 80Mhz QIO code OTA, even after several restarts and even a hard reset, it still reports ...
... but writing via serial yields the correct flash mode, no CRC error, but a different MD5 signature - even though I deliberately uploaded the same binary that was use for OTA ...
Code
|
Ok - so I'm glad you mentioned this - as I thought I was going a bit insane from lack of sleep - as I was building with I ended up using I wonder if esptool.py was patching the binary - or something was going on behind the scenes to make the checksum between the flashed file and the copied file be different... Driving me nuts. |
I just patched the boot loader to print what it's doing with the first 4 bytes of flash (where ESP.getFlashChipMode() reads the mode from). |
So do I understand this correctly - if its programmed via USB as QIO, any further OTA will remain in QIO mode - although you can change the frequency via an OTA. If you program via USB as DIO, it will stay DIO if you flash via USB, but you can't change it to QIO without flashing via USB again with QIO mode set? This would more or less match up with what I discovered last night. I believe it also means if you mismatch QIO / DIO between flash and build environment, you end up with a mismatched MD5 when checking for OTA... |
That seems to be the case - although the .bin file generated seems to have the correct information in it, so I'm not sure what's happening to it in between... I'm beginning to suspect something in the upload process, but I'm not familiar with how that works at the PC end. |
Aha! The Updater class messes about with the flash mode (Updater.cpp:349) |
@earlephilhower I seem to remember you mentioning something about keeping the previous mode with OTA, and that it was on purpose. Do you remember any details here? |
Why? I couldn't say as it was done before my time. My guess is that the idea was to be able to distribute a single binary image to a bunch of systems in the field with diverse flash configs (i.e. a product whose initial run used vendor AAA, but later moved to vendor BBB for cost reasons). It's something that probably should be undone in 3.0, but changing it would be a breaking change. |
Ah yes, that was exactly the original reason: to be able to distribute a single binary to devices with potentially different modes. Otherwise it would be necessary to maintain binaries per mode. And you do not want to change the mode of a remote device. |
When you think about how the even the Wemos D1 mini boards have several versions that do or don't support various flash modes it makes perfect sense to not allow a mode change OTA. |
I would prefer to see it made an option, rather than completely blocking it off. Maybe something like |
... or in my case where I did want to change the mode OTA :) It was more of a pain having to go around to each device and program it via USB to make the change happen. Having to disassemble various cases, mounts etc to do so. I'd agree that a much more sane approach would be to have a value which protects by default, but also allows it to happen - and if the user wants to, they can change from DIO -> QIO via OTA. |
PR #7317 now fixes experimental::initFlashQuirks() - which is what we were originally discussing here :) |
Probably the easiest thing to do would be have a flag to force the mode change - say set bit 7 of the mode byte. That way it could be set by a separate utility when required, plus the old updater/bootloader would still ignore the mode byte thus not writing an invalid mode, but once you sent the new bootloader OTA it would respect the forced mode change on the next update, maintaining backward compatibility. |
Does this change the calculated MD5? I noticed that when I was uploading via esptool.py, the binary I generated and copied to the OTA http server had a different MD5 than the one the D1 Mini was saying it had in its header. I spent a lot of time trying to figure out what was going on there - and I still don't quite understand it... |
The esp calculates the md5 of the whole binary as stored from the beginning of flash. So yes, the md5 will be different because the third byte of the OTA update is different to the third byte stored in flash. |
This would explain why copying the same binary to the OTA server would cause a reflash OTA again. I wouldn't think this is an expected behaviour - but then the question becomes, how could that be resolved? :| |
The only real way would be to store the md5 of what was downloaded for comparison with the server's version, or to update based on a version number rather than md5. |
Basic Infos
Platform
Settings in IDE
Problem Description
Core 2.7.0 and 2.7.1 ESP.restart() and ESP.reset() causes ESP to bootloop (ESP keep restarting, never start the program again). Unfortunately ESP didn't print any exception. In the debug message below, the program ran first time after Uploading. After first restart, ESP not printing serial and keep restarting forever at 1 sec interval. This is not happen in Core 2.6.3.
MCVE Sketch
Debug Messages
The text was updated successfully, but these errors were encountered: