Bootloader timeout #468

dpetican · 2023-08-08T17:48:05Z

dpetican
Aug 8, 2023

I'm using DxCore 1.5.6 I know that it seems to be deprecated since it has been removed from the Arduino boards manager. But I have quite a few production boards using this (and earlier) core versions and we are at critical bugfix level only with this version of our software and boards. But I still need to program some new boards before we move onto something new.

I use an ESP32 to program the AVR128DB32 remotely OTA. Generally we have not had any problems but lately I've noticed that say every 2 out of 10 boards is having issues programming the AVR.

Currently I initially flash the boards using the default bootloader entry condition and a startup time of 8ms. Subsequently when flashing a new binary the program on the ESP32 waits ~100ms after resetting the AVR before trying to sign-on to the bootloader. I suspect this is my most serious problem. But I'm a little confused as to why it works and (seemingly) consistently with some boards and not others.

I just realized that with DxCore 1.4.10 (previous production version) the board menu options were different. Specifically I used: Bootloader timeout = 1s

I read the bootloader documentation and am confused as to which menu option (in Arduino) I use to get an 1 sec bootloader timeout. The descriptions in the documentation do not match the language in the boards menu. There are actually 7 options in the 1.5.6 core. The most relevant seem to be:

default
external (reset pin)
external(reset pin), slow
external reset only

Does the default bootloader entry condition provide the 1 sec timeout?

Then there is this statement from the bootloader documentation:

"This was added in 1.6.8, however the bootloader binaries were no good. You need the versions from DxC 1.6.8."

I'm confused if the bootloader binaries are good in DxCore 1.5.6 and what option I would use to get a 1 sec bootloader timeout. I can't use an 8 sec timeout. Modifying the ESP32 code to shorten the delay after reset is an option. And I will probably do that anyway. 100ms was arbitrary. Thanks.

SpenceKonde · 2023-08-09T19:40:13Z

SpenceKonde
Aug 9, 2023
Maintainer

Thanks I'll correct the documentation. Which document was that in?

It's the 1.5.8 binaries that are bad, the 1.5.9 ones have all been recompiled. TWICE to fix issues in 1.5.9.

All versions of the bootloader prior to 1.5.9 are defective. Both the entry conditions are all incorrect. I don't know what they are, i just know that the makefile wasn't passing any entry condition parameters - that bug came when the long list of entry conditions was added. But there was another bug in there that had to be fixed too.

2 replies

dpetican Aug 10, 2023
Author

Here: https://github.com/SpenceKonde/DxCore/blob/master/megaavr/extras/Ref_Optiboot.md

dpetican Aug 10, 2023
Author

All versions of the bootloader prior to 1.5.9 are defective.

Did you mean back to the very first DxCore version or just 1.5.8 >= binaries < 1.5.9.something?

dpetican · 2023-08-10T11:59:12Z

dpetican
Aug 10, 2023
Author

I'm still wondering though:

Are the bootloader binaries in DxC 1.5.6 okay?
Since the bootloader timeout menu option was removed in > 1.4.10 which bootloader entry condition do I select in 1.5.6 to get the the 1 sec timeout I require?

BTW where is the Startup Time menu option documented and what does it do?

2 replies

dpetican Aug 10, 2023
Author

Okay I see where the bootloader timeout is:

So I am using the same timeout as I was in the previous core. But I do think I have this figured out now. Testing my hypothesis now.

dpetican Aug 10, 2023
Author

I thought it might have been the startup time = 64ms. Changing it to 8ms as I had in DxC 1.4.10 didn't work. [edit] At this point after rereading my OP I'm not sure what startup time I was originally using with DxC 1.5.6

dpetican · 2023-08-10T19:31:25Z

dpetican
Aug 10, 2023
Author

This is even weirder. So I flashed the bootloader with 8 sec timeout and a startup time of 64ms. I was then able to OTA flash the AVR once. Subsequent OTA flashes failed to signon to the bootloader. I can consistently repeat this by flashing the bootloader again.

This suggests to me that after I flash a valid program, the bootloader is not subsequently waiting for a command on the serial line. With a 1 sec timeout I could see how my ESP32 program might be the problem, but with an 8 sec timeout? Thats an eternity.

I apologize for asking this yet again but I need to eliminate variables and I can't find the documentation. What does the startup time actually do?

I guess may have to revert to the 1.4.10 core. It seems like the only logical solution at this point.

1 reply

dpetican Aug 10, 2023
Author

Reverting to 1.4.10 core did not work. Actually by that I meant just using the bootloader from that version. I also tried the 1.5.9 bootloader. In that case I downloaded the zip and used Microchip Studio to program the bootloader. Same problem.

Now I have to figure out how to manually install 1.5.6 which can't be installed using the boards manager since I reverted to 1.4.10

SpenceKonde · 2023-08-10T21:43:43Z

SpenceKonde
Aug 10, 2023
Maintainer

Any bootloaders from between the removal of the timeout menu and 1.5.9 are bad, and I do not know what (if anything) the entry conditions are because there was a problem with ParseOptions.mk - there was no EntryRequire option set up there so nothing was making it to the compiler And I didn't realize until someone reported it since I usually program via UPDI.

V 1.4.10 core is a loser. I don't think I fixed the scary serial bug (no RX on USARTs 2 through 5, and RX on those UARTs corrupted UART1's buffer) until 1.5.0. (nobody ever reported it - I saw it by chance!).

BUT- THE 1.5.9 bootloader absolutely 100% should work. This is a show stopper for the release if it does not seeing as fixing the bootloaders was a major objective!

@technoblogy - Haven't you tested the latest 1.5.9 bootloader binaries and found them to work?

Startup time sets..... the time between power on, and the start of code execution so you can delay start of code execution if you know the power supply doesn't rise quickly or doesn't rise smoothly. I didn't previously have a menu for it (it can generally be left at default, or set to 64 if you think your power supply it shit but also battery operated so you can't just use BOD), and someone complained over that because they had some crapola power supplies that they're using in something, I don't recall what kind of supply it was, but the performance was awful, maybe power constraints or something, but his parts wouldn't work if he didn't use a longer SUT (actually, this would have had to be MTC, because his core issue was that he wished he could have brownout detection, but he couldn't because he was running on batteries and that was too much wasted power, but 8ms wasn't long enough during the power supply ramp to make it from VPOR to Vmin_stable(Freq, Temp), and it would thus get off to the start of it's code execution on the right foot by misexecuting the first bunch of instructions and ending up hung or otherwise impaired.

Another person complained about not being able to set up the WDT fuse. That's why we have that menu. (I mean, it doesn't impede reprogramming). We're down to the point where the only things we don't expose as far as fuses are concerned is CRC (we don't support a method of generating the CRC, and there are more chips where it's busted than where it works), the lockbits (you need to do that yourself, since it needs to be done in a separate operation.

(I've been making steady progress on a UPDIProg() library that, claiming a USART and a GPIOR, will configure the USART for single wire half duplex, even parity, 2 stop bits and talk to UPDI target like that, and have whatever you want as the source of the data, whatever protocol you want to write the front end for - loading from SPI flash chips, small images could even be loaded from the chip's own flash, etc). I've already got the hardware for a programmer that would use that, a librarified version of ArduinoAsISP, take as input either a custom protocol (python script) that would send data in great big blocks to minimize USB latency, talking to the programmer, and there is no USB latency for the much faster paced call and response messaging between the programmer and the target, since it's being conducted by hardware uarts without that latency) - and it will be able to either write the data to the target, *or to an onboard SPI flash. Finally it will be able to xerox a connected chip into it's onboard flash. Flash is indexed an external eeprom but an 8 MByte flash (2048x4k blocks, I'd store each app or image as a number of blocks equal to that needed to fit it's whole flash, or 1 more than that if EEPROM is being written. 1k and 2k parts would be special cased to take only one block either way (you want to allocate by block because that's what you erase by).

What I want to do is have a a library that will have methods like, writePage(uint8_t* data, uint16_t len, uint32_t address); and it would check what the current address pointer is, and if we don't know, or this isn't what address is, we set the address pointer, and write the page, chipErase(); chipEraseKey() (erases with chip erase key to erase even locked chip), enterProgmode(), beginFlash(); (start writing the flash, by setting NVM command - mode exited by anything that writes other memorytypes, resets the chip, or writeEEPROM(uint8_t data* uint16_t address), writeFuse(fuint8_t use, uint8_t value); readfuse(uint8_t fuse); eraseEEPROM();

The WDT was very popular on classic AVR, but the reasons were mostly because... it is the only way to issue a reset from software on classic and to periodically wake from sleep on classic AVR. The use of it as a stereotypical WDT was almost unheardof. In the modern AVR times, things obviously have changed. We now have a real software reset (though WDT enable -> while(1); still has utility because it will reboot the chip without running Optiboot), the RTC got the exclusive contract to supply periodic interrupts to the modern AVRs, leaving WDT for reset only, though the WDT gained it's window. I've only interacted with 2 people via issues and none via any other means. But apparently several people on mTC or DxC are actually using it like a watchdog timer (IMO, if your code is hanging, you should address the reason for that rather than using the WDT to recover, but I can see why one might do that - either as a backup or a stopgap for an emergent bug after a device was in use. )

I added a section to Ref_Reset.md (The Ref_ documents and the included library README files constitute the meat of the documentation, Reset was actually one of the first I wrote because of what a big topic it was, and how wide-ranging the consequences can be, since reset encompasses consequences of and mitigation measures taken by the core against resets which are not "clean", as well as a step-by-step description of the process by which each of the three ways that software bugs generate these get from the original bug to the "dirty reset" state, where PC = 0x0000 but no hardware reset has occurred. At that point there are only two potentially correct things to do: You could declare "System is in a bad state" throw your arms into the air, and enter a while(1); hanging the part until reset (great if you have a hardware debugger you can stick on live, Not so useful otherwise), or you can immediately issue sw reset, which we do to reset cleanly). Prior to about a year ago? the cores took a third option. Closing their eyes, plugging their ears, and marching forward as if there had been a hardware reset. As, like all init routines in arduino history, my init routines explicitly assume reset config as their starting point, the whole edifice would collapse around one's ears on an untrapped dirty reset. (hence the actions taken by the core there are consistent with our design principle that "one should not enable or encourage configurations which you know will not work yet will not give compile error" back when this last came up.

1 reply

technoblogy Aug 11, 2023

@technoblogy - Haven't you tested the latest 1.5.9 bootloader binaries and found them to work?

I tested the 1.5.8 bootloader on an AVR128DB48 and it worked fine. I haven't tried 1.5.9.

SpenceKonde · 2023-08-10T21:46:06Z

SpenceKonde
Aug 10, 2023
Maintainer

1.5.6 did not have a magically good version of the bootloader! It was a bad version of the bootloader. And in fact, I don't think it even compiled on some platforms (bungled toolchain package)

0 replies

dpetican · 2023-08-11T02:13:19Z

dpetican
Aug 11, 2023
Author

Ummm IDK anymore. My head is spinning faster than Regan MacNeil's. I did see the document about reset. I thought I should read it but didn't. Now I will. @technology I believe the issue has to be either the hardware that controls the reset or the ESP32 that does the flashing. Because the OTA flashing works on 8 out of 10 boards.

And I did notice that on the failed boards if I flashed a bootloader with an 8 sec timeout, the OTA flash would work once and never again. I always power cycled after each OTA flash just to be sure. How strange is that?

Then I dumped the flash (including bootloader) from a a working board and flashed a non-working board. No OTA flash success on first try.

In case it matters the "reset hardware" consists of a FET transistor to level shift the ESP32 pin controlling the reset and on the AVR side ONLY a 10K pullup resistor. I did not think that the 0.1uF cap would be required as I am not doing a classic autoreset. In fact I hold the AVR in reset for some time while downloading the flash file before releasing. Then drain the serial line and wait 100ms plus the cycle time required to format the command string etcetera before attempting to send the signon command. This total delay time is certainly well less than 1000ms. I have a LED connected to the boot pin. I do see it flash briefly every time the reset is released. So I believe the bootloader must be be running even if the OTA flash fails.

My signon routine attempts to signon 3 times then quits. It just occured to me that I should extend the number of retires. Should there be a small delay between attempts?

Also, just occurred to me is I am assuming that draining the serial line takes well less than 1 sec. Maybe this is not the case sometimes. Maybe I should get rid of that and as above increase the signon attempts in case the first couple return garbage because of serial noise?

1 reply

SpenceKonde Aug 11, 2023
Maintainer

No if the signon fails three tries it's gonna fail as many times as you try, that's not a worthwhile avenue to work on.
I don't think I've ever seen an upload succeed that failed more than one attempt to initialize the connection. It annoyed the hell out of my that AVRdude banged it's head into the wall 10 times.

Flash working once and never again is symptomatic of incorrect bootloader entry conditions on modern AVRs, that's why you need the 1.5.9 bootloader files, which (I think) work as advertised.

WestfW · 2023-08-11T06:53:05Z

WestfW
Aug 11, 2023

It annoyed the hell out of my that AVRdude banged it's head into the wall 10 times.

At one point, I think I submitted a bug to the effect that the DTR drop that causes Arduino Auto-reset was OUTSIDE of the avrdude retry loop, so that a retry was unlikely to ever work. That was a long time ago, I don't remember where I reported the problem, there's been so much movement that I can't find it, and I'm pretty sure it was never addressed. :-(

2 replies

SpenceKonde Aug 14, 2023
Maintainer

It's easy enough to test, I'll do it tomorrow out of curiosity if nobody else has by then, since I have boards lying around with Dx's with autoreset circuits on them and the serial lines connected to USART0. Just put a sketch that flashes the led once in setup and then does nothing more.... then switch to the optiboot definition and attempt to upload to it, and see whether you get flashes on every attempt.

By never been addressed, you mean by Arduino right? Because if something that simple wasn't fixed by the new AVRdude people, as they have dealt with a bunch of far less significant issues I'd be kinda stunned,

SpenceKonde Aug 16, 2023
Maintainer

Lol, yup, in the last Arduino AVRdude version, it tries ten times... but never resets in between, thus ensuring that the subsequent attempts will fail... the only case I can imagine that being useful in is if they had a custom version of optiboot, on a classic AVR, with the WDT fused on, and then your optiboot entry conditions would be PORF | WDRF - but no, that doesn't work, because there isn't a good way to get yourself back to the app then, short of also having either all app code know and undo the mess the optiboot bootloader didn't reset because it's supposed to do that with a WDRF in 4 instruction words, instead of dozens as would be the case if it had to painstakingly clean up everything it had written. (though, interestingly enough - you don't need to care where the hell the stack pointer is pointing, or whether r1 contains 0).
I mean I guess ya could...
With well formatted code, you could regex search the bootloader source for ^ *[A-Z0-9_]* *\+?= or something close to that, find all, copy, new file paste. find replace ^ +, then .* , copy+paste to excel next to a column of incrementing numbers. Alphabetize on second column, second column remove duplicates, both columns, sort by number column to get them back in chronological order, copy second column back to sublime, find replace \n with = 0;\n then go down the list. Finish manually by removing from the list any values that you know optiboot resets to 0 in the course of operation (if any? I can't think of any that consistently are, because if you're doing the WDR to exit - which is obviously the Right Way - you would rarely end up setting things to the default), and otherwise manually adjust the cleanup routine for special registers. If it fit in the same page size, into the bootloader with it, otherwise that system wouldn't be entirely horrific. It would enter the bootloader in response to the application triggering a reset (via WDT), or if the application has hung and it timed out. Presumably the reason the reset button wasn't enabled would be the same as the reason you wouldn't care that it didn't trigger autoreset: This is on the far end of an RS-485 line, or connected through some serial to wireless mechanism and physically inaccessible.

And with that whole 3/4ths-assed mess doing OTA firmware updates, yeah it had better reset into the bootloader if it relies on application resets to enter bootloader normally and you don't have easy access to it. But if that's the case, and you don't have control over the power and flip the breaker for that circuit while attempting an upload - though now that I think about it the 10-try separated by 10 seconds each would be good there too, unless the system you're controlling it from happens to be located next to the main breaker panel), if it were powered by batteries or solar, the fact that the WDT being fused on locks WDIE leaves you with no time to spent most of your time in sleep mode. So it has to be continuously powered by either something with it's own power source, but which works with much large amount of power, such that 10mA@5V is a rounding error - monitor for a large solar/battery system? which is inaccessible because... it's on the roof, but not inside? you lived on a farm, but decided you don't like farming and planted solar panels instead? They've commercialized those solar panels that are transparent to the wavelengths used by plants and you own a greenhouse? It's RS485 and you've come up with some hideous way of powering it from the RS485 line and a supercap or something? (I am not sure that's possible unless you either rely heavily on sleep, or end up in the ridiculous position of having to have the master spend some percent of it's time sending either 0x00 or 0xFF to keep it alive - neither of which seem viable in this absurd niche case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bootloader timeout #468

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 9 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Bootloader timeout #468

dpetican Aug 8, 2023

Replies: 7 comments · 9 replies

SpenceKonde Aug 9, 2023 Maintainer

dpetican Aug 10, 2023 Author

dpetican Aug 10, 2023 Author

dpetican Aug 10, 2023 Author

dpetican Aug 10, 2023 Author

dpetican Aug 10, 2023 Author

dpetican Aug 10, 2023 Author

dpetican Aug 10, 2023 Author

SpenceKonde Aug 10, 2023 Maintainer

technoblogy Aug 11, 2023

SpenceKonde Aug 10, 2023 Maintainer

dpetican Aug 11, 2023 Author

SpenceKonde Aug 11, 2023 Maintainer

WestfW Aug 11, 2023

SpenceKonde Aug 14, 2023 Maintainer

SpenceKonde Aug 16, 2023 Maintainer

dpetican
Aug 8, 2023

Replies: 7 comments 9 replies

SpenceKonde
Aug 9, 2023
Maintainer

dpetican Aug 10, 2023
Author

dpetican Aug 10, 2023
Author

dpetican
Aug 10, 2023
Author

dpetican Aug 10, 2023
Author

dpetican Aug 10, 2023
Author

dpetican
Aug 10, 2023
Author

dpetican Aug 10, 2023
Author

SpenceKonde
Aug 10, 2023
Maintainer

SpenceKonde
Aug 10, 2023
Maintainer

dpetican
Aug 11, 2023
Author

SpenceKonde Aug 11, 2023
Maintainer

WestfW
Aug 11, 2023

SpenceKonde Aug 14, 2023
Maintainer

SpenceKonde Aug 16, 2023
Maintainer