-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-void function with no values returned causes crash #5867
Comments
Hello, we did some changes from v2.0.0. Are you able to validate this with version 2.0.3-RC1? Thanks! |
Hi, been using 2.0.1 so far with the proposed workaround (void function instead of non-void function). |
@garageeks any updates? |
I've come across this exact same issue with the released 2.0.2. A project that compiled and ran just fine on 1.0.6...now bootloops the ESP32 on 2.0.2. Interestingly enough, the backtrace is reported corrupted every single time; I've never seen that before. (Worth noting that the normal Arduino IDE compiler warnings are set to "None"; setting them to "Default" showed a lot of warnings.) After several hours of debugging, I traced the issue down to a function declared "int func([args])", but which didn't have a RETURN on every single possible exit path. The corresponding warning provided was: OK, whatever...updated compiler, new rules, it wasn't an error, etc. My code wasn't even looking at the return value in the first place--so it wasn't a code problem. Only this is what was causing the ESP32 crash. Debug PRINT statements show the following behavior in the function:
(where "Func Start" is at the top of the function, and "Func Exit" is at the end, right before the commented-out RETURN statement.) Yes, yes, I know, "what's the function do??" While that's not pertinent here, the function runs a Steinhart-Hart calculation and then prints it to an LCD (if requested, otherwise it returns the converted result without printing to the LCD). Worth noting that the ESP32 doesn't crash when the Serial.Print statements are in place (due to the task scheduler getting a break). If I remove said statements (and leave the RETURN commented out), I get a crash dump like this:
(worth noting, I did not edit the above dump at all; the "|<-CORRUPTED" is what the ESP32 sent) It should be relatively trivial to make a very simple "reproduce case" for this. |
hello, thanks for update. We will investigate this. |
I had to get the "latest" (2.0.3 ++ whatever!) Arduino-ESP32 firmware yesterday due to another bug that was fixed ~4 days ago (#6659 to be exact!), so if necessary, I can test this on the current "master" branch. |
@WebDust21 I don't understand all of those exclamation mark you have used... but we will soon release 2.0.3-stable so it can be retested with it. |
In C++, Omitting the return statement in a function which has a return type that is not void is undefined behavior. It is a coding problem. |
@hreintke Especially considering that the default compiler alerts setting in Arduino IDE is "none." |
My point about saying "it wasn't a coding problem" was that my code was not faulting as a result of the undefined function response. The fault is a result of the compiler's handling of said function. |
I don't want to get in a semantical discussion so this will be my last remark on this.
As not having a return statement executed is undefined behavior, any handling of the compiler of this is OK. |
I think nobody disagrees with the fact it is a programming fault. |
I just ran into this problem. I am not so familiar with C++ but in C a missing return generates a warning but is benign - I believe its because there is always a place for a return value on the stack. |
I can confirm this for ESP32 version 2.0.6, spent the whole afternoon hunting it down, given that core version 1.0.6 resulted in a perfectly running firmware for my device. And the exact same source code gave an obscure core 1 panic illegalInstruction after the elusive missing return. Tried with both with and without inline in front of the function declaration. Including a return made the problem go away. (Note: my snippet gives a slightly different console message) The sketch int theCulprit(void){
Serial.println("After this we crash");
}
void setup() {
// put your setup code here, to run once:
Serial.begin(115200);
Serial.println("hello 1");
theCulprit();
Serial.println("somehow we survived");
}
void loop() {
// put your main code here, to run repeatedly:
} Console
|
@VojtechBartoska or someone close to the design team, I've gone through the thread and came to this conclusion: please consider issuing a compiler error for this missing return thingy, and its consequence, crashing the whole system if a non-void subroutine returns nothing, instead of a muted-in-arduino-by-default warning. This behavior, crashing, is pretty much the very definition of a breaking change, since the 1.0.6 toolchain compiled and ran my exact same code just fine, for whatever reason, and it wasn't until the holiday break that I had time to hack and go core version 2 on my toys. True, from an engineering perspective, the missing return is point blank ugly, and my bad there, typos happen, ignorance happens, sloppiness happens, overlooking one's own diffs and pushing unintentionally altered code happens, these all happen. But there is also the PR perspective here, beyond people's wasted time on debugging: it's arduino we are talking about. Not everyone compiling for esp32 happens to work in an r&d, with all the right tools and know how to get the idea where to start looking. Ham fisting this issue into novice developers' mind the hard way, like in good old x86 times pop ds system-bye-bye, is like the monk cutting off a finger of the disciple to show him where satori is not. If the result is a crash, and the linter/compiler/parser knows it, please don't let it run and crash. |
Yep, the compiler should fail on this. IMHO this is also a bad design for the compiler as it never ever should allow to finish on such code. |
This bit me too. Using "-Werror=return-type" as a build argument promotes this warning to an error in the future. It would be helpful if it was an error by default, but some constructs need it to not be an error, and are allowed by the c++ standard, like some check for an unrecoverable error:
(This is a weird example, but oh well...) I think the default build flags in e.g. platformio should have it on by default, though. |
And this issue bit me too.. Not a big deal to fix it, when you stumble upon, by adding a return accordingly. bool checktheday(int day) So, the question is, should we be strict or practical.. |
The worst part about this is that when it DOES crash as a result...the backtrace and all debugging outputs have always been completely useless in my experience. So you can spend days trying to figure out "what's wrong with routines x,y,z" (and they may even be official ESP-IDF routines!) before finally getting a bit bombastic with the debug methods. And then only to finally figure out a missing "return" in a completely separate code file. worth noting that PlatformIO does not have this as an error either. I was so convinced that the Arduino-ESP32 SPI.WriteBytes function had some critical fault in it (as that's where the crash stack pointed to on an ESP32-S3). Well, after peppering the whole code section with Serial.Print statements, I found a missing "return" on a non-void function in a different file. Fixed that and poof the ESP32-S3 stopped crashing. C'mon @VojtechBartoska...just make this a compiler ERROR (instead of a WARNING), and close the issue. As far as issues go, this one's slam-dunk easy to resolve. |
I really wonder what would be a valid reason not to have this being considered an error and stop compilation? |
Prior to 2.0.0....yes, having it a warning was OK, as it didn't cause mysterious crashing of the end user application. But I completely agree that the issue is a result of poor C coding practice. Even prior to 2.0.0, however, there still was no reason for this to be a "warning" in the first place: there is nothing to be gained. From a machine-language standpoint, a non-void function exiting without a given return value could provide a completely useless/random/unknown/unspecified return value. That is, if the return value is provided in a CPU register, and not put on the stack. |
Still, we programmers are human and thus may benefit from some machine check to help us prevent making such errors. But I still consider myself perfectly capable of making programming errors and would really not mind if a compiler would help me to not speed up turning my hair all grey or nails been bitten off in search for my own stupid errors. So what perfectly fine reason is there to not enable this compiler flag to save my sanity for a little longer? |
Yeah, back in the old day in x86 we had the return value in a register, either ax, or dx:ax for 32 bit values, and the caller took care about pushing popping said register, no "surprises" were left on the stack. This is exactly what must have changed from esp32 ver 1.x to ver 2.x, and I still don't get why the silence. Are we literally like five people on the entire planet running into this issue? |
The five of us that found this thread, for sure. I understand the concept of garbage-in-garbage-out, but it is nasty that older projects that ran fine for years, can now have hidden crashes that may occur totally unexpected when a rare condition is met. If a function exits early with a proper return value, all is well. But there may be a rare exit condition that makes it crash. My projects are mostly hobby stuff, call it toys if you want. But I reckon many will use it for more serious stuff, and a lurking crash which only presents when a rare condition is met when you least expect it... |
@Frank-Bemelman in my case, I have several units running jig tests at my workplace, cause I found these cute boards with a glued on oled display. On the other hand, the same boards run my amateur astronomy rig, a nice testing ground before I put code to a more serious board. This is how it crashed my astronomy rig, leading me to spend the christmas tracing the issue to exactly here, to this thread. Long story short, from a PR perspective, this breaking change just casts a shadow onto the entire playground of esp32's. We are people, writing bad code in general. I am grateful the days of x86 assembly are over, and I can finally focus on writing stuff instead of counting the bits. And here we are, counting the bits, again. |
Probably the reason that more people aren't blowing this thread up, is because it is extremely difficult to pin down the cause/source of this bug. |
yup, it's not an easy bug to track down! Took me most of a day the first time, and more like a few hours the most recent time. It's the weirdest thing, when your subroutine mysteriously loops back on itself infinitely until the processor crashes due to WDT or stack fouling... |
And it is a sort of a time bomb. I bet there are thousands of c functions in github projects just waiting for a compile and crash. People tend to stick to their setup as long as possible, but sooner or later they will upgrade for whatever reason, and then it happens. Like a melting ice cap. |
In my case it was related to button presses and beeps, buttons being multiplexed cause not enough pins to begin with, in the meanwhile I read the tone function, previously a polyfill, now went into the os, into the multithreading, with message queues and stuff, ... I suspected ESDs initially, so went crazy by isolating the front panel with optos, read a ton of docs and forums about the ADC, cause maybe some code there, perhaps mixed in with the wifi, what about the led_c used to generate the pwm for the buzzer... just to be sure, methodically eliminating every possible culprit, installing and removing the 1.x, cross compiling with all versions to see the behaviors, only to find out that I had a missing return somewhere in my code, and that the 2.x codebase handled it differently than 1.x. |
I stumbled about the exact same topicc(multiple times now). |
@VojtechBartoska This issue has been open for 4 calendar years (2021, 2022, 2023, and now 2024). It's stupidly easy to fix (i.e. simply add "-Werror=return-type" to the buildflags). What in the world is the hangup with fixing and closing this issue? |
yet nobody that cared have submitted a PR 🤷♂️ what you are asking for in no way makes this repository's source work any better. Why not enable warnings and errors in Arduino IDE and make sure that your code compiles clean? |
I find this reply outrageous, especially coming from a member. We did contribute with the accurate description of the problem, sample code to reproduce the behavior, and suggestions how to fix it, leaving the exact solution to those more into the project. I, at least, know from experience how unwelcome are the bit-counting solutions coming from uninitiated people. Hence, I will not PR compiler version choices nor compiler settings choices, as both may have implications outside the scope of this ticket. We also provided the big picture issue: this is a problem precisely because of the default settings on computers used by relatively novice folks, folks who didn't eat assembly for breakfast a quarter of a century ago, because they were not even born yet, and precisely because it is a breaking change that alters the behavior from 1.x...working to 2.x...crashing with no default feedback whatsoever. The fact that there are so few of us here in the first place, shows how few have identified the bug and cared enough to signal that there is a problem, along with the accurate description of the problem. |
I care about it (as evidenced from posting on this issue), yet my Github skills are pretty much nonexistent. Any PR I've ever submitted was the result of someone much more experienced literally coaching me through every single step.
I have to agree that it does not improve functionality of the code. However, it WILL reduce the population of bald people in the world--because users won't be pulling their hair out for days on end over a ridiculously simple problem that could have been mitigated four years ago.
Firstly, as I mentioned earlier in this issue, I'm actually using PlatformIO at this point. Whereas if this was switched to an error, the problem would be immediately identified and pinpointed to the user at the exact location of fault. (The crash backtrace is completely useless in pinpointing the trouble spot in this case.) |
I would appreciate it too if this would be solved with that compiler error flag. |
This is then a totally different case. PIO flags come from another place, not platform.txt. They are autogenerated (same goes for most Arduino flags) from the build with ESP-IDF. So I think it's up to you to add the flag to your build configuration, or talk to PIO to add it by default.
What bug is that? So far I understand that you are talking about preventing bugs while writing code, not fix something already here. And it's not like there isn't a way, just you want it more automatic, so you don't have to dig in the menu and enable errors. To be clear: |
PIO is a derivative from Arduino-ESP32, and so it is unsurprising that the issue persists there as well. But to be clear, it's completely present in the Arduino IDE.
Dude. It's been FOUR CALENDAR YEARS with no progress on this issue! Or for that matter, any attempt from the repository maintainers at one of the world's simplest issues to fix and remove from the list.
Sheesh. I dunno what's wrong with people these days. @me-no-dev I haven't submitted any PRs for one VERY SIMPLE REASON: I'm not qualified to do anything other than waste your time with PRs that clearly indicate to everyone how much I don't know about the build systems. (Plus that I am terrible at the Github process.) The specific buildflag suggestion was from @danie1kr, and it looks good to me--but I have no idea whether it is the correct one. Or for that matter, I haven't a clue where to put it--or which dozen-odd files need corrected for an across-the-board fix of all variants. Honestly, as someone who is extremely knowledgeable about the Espressif frameworks, you've spent far more time here throwing up sand than it would have taken for you to fix the issue. I give up. |
awesome, thanks! 👍 |
thank you |
Finally! Now I have one less stupid error I can make to shoot myself in the foot :) |
Woohooo! Progress! |
Make your question, not a Statement, inclusive. Include all pertinent information:
What you are trying to do?
While writing firmware, I stumbled in puzzling crashes. This firmware used to work fine with arduino-esp32 2.0.0.
I reverted back to 1.0.6 and it worked. No compiler errors were thrown with both.
The crashes sometimes produced a Guru Meditation Error, sometimes it was stuck and the watchdog would trigger, sometimes it simply rebooted.
By reverting to the previous commit, and adding bit of new code at a time, I pinpointed the culprit.
An initialization function declared with a bool, had no returned values.
Latest print on Serial was "pin set" and then boom.
Looking closely, the firmware throws a warning:
[...] In function 'bool mcp23008Init()':
[...]:2570:1: warning: no return statement in function returning non-void [-Wreturn-type]
The line corresponds to the end of the mcp23008Init function.
Hardware:
Board: ESP32 Dev Module
Core Installation version: 2.0.0 (crash), 1.0.6 (working)
IDE name: Platform.io
Flash Frequency: 40Mhz
PSRAM enabled: no
Upload Speed: 115200
Computer OS: Windows 10
According to some StackOverflow posts, declaring a non-void function without returning a result is undefined behavior. Probably some compiler settings changed between 1.0.6 and 2.0.0
https://stackoverflow.com/questions/57163022/c-crash-on-a-non-void-function-doesnt-have-return-statement
When changed the function to void, the code works with 2.0.0. as usual.
It is not a arduino-esp32 bug.
I would expect a compiler error, not a warning. However, if this is not possible, the noobs like me should be aware!
The text was updated successfully, but these errors were encountered: