Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code robustness improvements, minor speedup, fx sliders bugfix #4463

Merged
merged 12 commits into from
Jan 19, 2025

Conversation

softhack007
Copy link
Collaborator

@softhack007 softhack007 commented Jan 7, 2025

I had a chance to run sonarcube on the WLED codebase. This PR solves a few "real" findings reported by the tool.

  • made XY() and _setPixelColorXY_raw() const -> minor speedup
  • segment is a struct not a class: friend class Segment -> friend struct Segment
  • fixed missing braces around two macros
  • improved robustness of transition code - sonarcube actually found an utterly complicated path where _t could be nullptr - so better-safe-than-sorry I've added a simple check.
  • use non-throwing new where possible - the standard new throws an exception when out of memory, however it won't return nullptr -> replaced with new(std::nothrow) for cases where the code has an explicit check for nullptr after the call to new. Only relevant for arduino-esp32 v2.0.xx (S3,S2,C3), where a failed new always led to crash with unhandled exception.

* make  XY() and _setPixelColorXY_raw() const (minor speedup)
* segment is a struct not a class: friend class Segment --> friend struct Segment
* fix missing braces around two macros
* use non-throwing "new" where possible
* improve robustness of transition code
@softhack007 softhack007 requested a review from willmmiles January 7, 2025 14:21
@DedeHai
Copy link
Collaborator

DedeHai commented Jan 7, 2025

these are some great findings!
actually, I know there was/is a bug in segment transition handling, which with most FX results simply in wrong sliders/checks being set but with the particle system, it leads to a crash since it will return a pointer to old memory instead of the new and thus leading to a mess as the particle system class actually resides at that pointer. At least that is what I remember from chasing that bug. IIRC the issue was with a cancelled transition when during a transition, a new transition starts: instead of copying the temporary segment data, it just deletes it and sets the pointer back to the original data (or something like that).
I gave up on hunting that down fruther as the transition handling is so convoluted I got nowhere.

... so we must use `int` instead of `unsigned`
these are reserved names and future compilers may reject them.
* changed some parameters to "pointer to const", so compiler can better optimize code size and performance -  because data behind a const pointer will never be modified by the called function.
* made setPixelColor `const`

* fixed a few potentially uninitialized local vars (the may have random values if not initialized)

* avoid shadowing "state" in handleSerial()
* plus a few very minor improvements
@softhack007 softhack007 requested a review from DedeHai January 7, 2025 19:56
@softhack007
Copy link
Collaborator Author

softhack007 commented Jan 7, 2025

I've gone through more recommendations from sonarcube - fixed a bug in the function that extracts sliders from effect meta-data, added some minor speedups, and corrected a few corner cases.

  • changing pointers/references in function parameters to "const" allows the compiler to optimize better. Also its a good way to make sure that future code changes do not - accidentally - modify their inputs via pointers.
  • fixed handling of String::indexOf() "not found" results in extractModeSlider() - previously "-1" became UINT_MAX so checking for "index > 0" did not work as intended.
  • made setPixelColorXY() const -> speedup
  • Initialized some local vars that might be read uninitialized in some corner cases
  • avoid using C++ keywords: final, module (these might be rejected by new compiler versions)
  • fixed misleading indentation in simulateSound()
  • robustness: if (SEGLEN == 1) --> if (SEGLEN <= 1) - doesn't hurt

@DedeHai @willmmiles sorry the last changes may make it somewhat complicated for you to review - you might want to go commit-by-commit to have some more context. Everything compiles without new warnings, so I assume that the const changes are correct.

Edit: This PR is "final" now, meaning there are no additional tool findings I want to address.

@netmindz
Copy link
Collaborator

netmindz commented Jan 8, 2025

0.15.1 or 0.16?

@softhack007
Copy link
Collaborator Author

0.15.1 or 0.16?

0.15.1 as it's not changing functionality, but fixes two bugs and improves code quality.

@DedeHai
Copy link
Collaborator

DedeHai commented Jan 8, 2025

I compiled this PR for ESP32 and for C3:
ESP32: code size increases by ~150byte
C3: code size increases by ~500byte (!)

Does not look like the compiler does a better optimization. Or is this due to the compiler doing more inlining or is there a good explanation for the extra code size? If its inlining of core functions that actually help to speed up the rendering, then I am ok with it, if it's just for "nicer code" I don't see the point.

@softhack007
Copy link
Collaborator Author

softhack007 commented Jan 8, 2025

@DedeHai actually that's a bit unexpected 🤔
It might be that due to "new(std::nothrow)" some error handling code is not considered "unused" any more by the compiler, so these parts are not removed any more during compilation.... plus "new(nothrow)" is actually a different function in the std library, so we pull in a bit of extra code from stdlibc.

I'll check it commit-by-commit.

@softhack007
Copy link
Collaborator Author

softhack007 commented Jan 8, 2025

@DedeHai thanks again for raising the question

metrics, metrics, metrics ... So i've done some steps-by-step comparison to find out where the increase comes from:

image
PR4463_size_per_commit.xlsx

The results are supporting my first guess - the main increase is from introducing new(std::nothrow)

Now, what to do? new(std::nothrow) is actually improving robustness, as it avoids crashes in low memory situations. On the other side, these crashes were always there, so we could postpone that part to 0.16 - I can change this PR to exclude the std::nothrow change, for merging into 0.15.1 with just a minor size increase - still fixing bugs and improving code quality.

Question for all maintainers: What do you think?

@softhack007 softhack007 requested a review from netmindz January 8, 2025 13:07
Copy link
Collaborator

@blazoncek blazoncek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work.

wled00/FX_fcn.cpp Outdated Show resolved Hide resolved
@DedeHai
Copy link
Collaborator

DedeHai commented Jan 8, 2025

Now, what to do? new(std::nothrow) is actually improving robustness, as it avoids crashes in low memory situations. On the other side, these crashes were always there, so we could postpone that part to 0.16 - I can change this PR to exclude the std::nothrow change, for merging into 0.15.1 with just a minor size increase - still fixing bugs and improving code quality.

so the issue is that new() returns a valid pointer even if it fails? that sounds like a major compiler flaw. that being said: the issue with increased size may be, that there are (for sure) other places where the normal new() is used, if the nothrow could be enforced to be used globally for any new() call then the additional flash use may go away.

edit:
if there is no way around it and it helps to make WLED more stable then by all means, spend that extra flash.

it is a bit of a pain having to weigh every byte of code... I really would love to see that boot-fallback partitioning, freeing up most of the OTA partition, giving us the liberty to add more features without dropping current ones. Looking at features waiting in the line like the pixel-magic tool, effect transitions, updated AR and the particle FX the problem will only get worse, not even speaking of updating to newer IDF. While I managed to free about 15k of flash by optimizing code, the particle system will eat all that plus another 15k on top.

@netmindz
Copy link
Collaborator

netmindz commented Jan 8, 2025

Unless we think the increase in size is going to tip a significant number of users over 100% then I'm ok with adding this to 0.15.1 if that's the consensus, but my personal preference would be for the use of new to be in main only

As @DedeHai mentions, we really need to sort size issues before 0.16 release, but we have a long time period for this

@softhack007
Copy link
Collaborator Author

softhack007 commented Jan 8, 2025

so the issue is that new() returns a valid pointer even if it fails?

@DedeHai it's even more crazy:

  • if memory is available, new always returns a valid object/pointer.
  • if memory is not available, new aborts the whole program and throws an exception. This exception needs to be "caught" explicitly (see c++ throw/catch exception mechanism). It never returns nullptr.
  • new(std::nothrow) however does not create an exception, but returns nullptr when out-of-memory. So it behaves similar to malloc().

As we don't catch exceptions inside WLED, any exception is leading to a crash (=default exception handler).

To prevent this, a try - catch block is needed. It can be on a very hight level (like arduino loop()) but it must exist somewhere in the call chain.

#if __cpp_exceptions
        try{
          myObject = new object;
        } catch(...) {
          myObject = nullptr; // low heap memory -> handle error
         DEBUG_PRINTF("out of memory\n");
       }
#else // the compilation does not support exceptions, i.e. gcc -fno-exceptions
          myObject = new object;
#endif

#if __cpp_exceptions is usually recommended, because you'll get an error otherwise when the compiler does not support exceptions.

PS: i fully support your comment about Tasmota-style boot-fallback partitioning - this would give us enough flash space for years, plus we could have speed optimized builds (gcc -O2) instead of optimizing for size (gcc -Os).

@DedeHai
Copy link
Collaborator

DedeHai commented Jan 8, 2025

@softhack007
I do remember the try...catch from my C++ classes but alway thought of such a concept as flawed for an embedded system, but if things are complex it could make sense.
There is another way to do it, which is what I have done in the particle system: use malloc() to allocate memory, then use new() to instantiate in that preallocated memory. Not sure its a good way, but it avoids the whole "try and catch" thing and it can be tested for null after the malloc()

Edit:
regarding the tasmota approach: anyone who can pull that off? I certainly do not have enough knowledge on how to do it but I would be happy to assist.

@blazoncek
Copy link
Collaborator

regarding the tasmota approach: anyone who can pull that off? I certainly do not have enough knowledge on how to do it but I would be happy to assist.

On purely theoretical level it is rather simple: create 2 app partitions (as we already have, but of unequal size), load OTA loader on one (smaller) and firmware on another (larger). When requesting OTA update, switch active partition and reboot device. If OTA is canceled it needs to switch active partition, otherwise proceed as normal.

wled00/wled_server.cpp Outdated Show resolved Hide resolved
@@ -276,8 +276,8 @@ void Segment::blur2D(uint8_t blur_x, uint8_t blur_y, bool smear) {
if (!isActive()) return; // not active
const unsigned cols = vWidth();
const unsigned rows = vHeight();
uint32_t lastnew;
uint32_t last;
uint32_t lastnew = BLACK;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are actually not used when unintialised. if (x>0) takes care of that. Similar in FX_fcn.cpp

@blazoncek
Copy link
Collaborator

blazoncek commented Jan 8, 2025

I've checked all those new(std::nothrow) and all, except one, could be replaced by malloc() without any side effect.
The exception is in startTransition().

EDIT: BusConfig is exception as well.

@blazoncek
Copy link
Collaborator

On a side note, I've started the same process this morning unaware of this PR. 😄

@softhack007
Copy link
Collaborator Author

softhack007 commented Jan 8, 2025

I've checked all those new(std::nothrow) and all, except one, could be replaced by malloc() without any side effect. The exception is in startTransition().

good finding 👍

Normally there is a "general word of caution" to always use the matching deallocation method.

  • malloc(), calloc(), realloc() --> free()
  • new --> delete
  • new[] --> delete[]

@blazoncek did you check this aspect, too?

@DedeHai
Copy link
Collaborator

DedeHai commented Jan 8, 2025

On purely theoretical level it is rather simple: create 2 app partitions (as we already have, but of unequal size), load OTA loader on one (smaller) and firmware on another (larger). When requesting OTA update, switch active partition and reboot device. If OTA is canceled it needs to switch active partition, otherwise proceed as normal.

Yes, if you break it down like that its simple, but it gets complicated real quick. I wrote bootloaders before, its not rocket science but here it is a bit more involved:

  • the boot partition not only needs OTA but it needs a web-UI and it needs to be as streamlined with regards to flash usage as possible
  • not only if OTA fails it needs to fall back to the other partition but also if it boot-loops (quite easy to solve)
  • we need a "transition" version that fits current partition layout and prepares for the new partition layout (needs to be done carefully or it can brick devices), I don't think the "you can only update by erasing flash" is a good option
  • tasmota is GPL so can't use any of that code and need to write our own (unless authors agree I guess?)

I think this would need someone to take the initiative, create a new branch and do a proof of concept, maybe based on some arduino/IDF example, Not sure how to keep it small though: even my code for the ESPNow remote (Arduino IDE based) already uses over 500k of flash for just basic wifi libraries, so maybe the bootloader should be IDF based?
just some thoughts.

@blazoncek
Copy link
Collaborator

blazoncek commented Jan 8, 2025

@blazoncek did you check this aspect, too?

Yes. AFAIK unless you instantiate a non-POD, new will not zero or initialise POD memory.
Some places would benefit of calloc() or realloc() instead of free/malloc.

EDIT: code reduction is about 100 bytes. Retaining std::nothrow where appropriate.

EDIT2: malloc seems to be working just fine.

Copy link
Collaborator

@willmmiles willmmiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! All positive steps.

Re allocation: new(std::nothrow) is a positive step, even though yes, it'll make the compiler actually implement the null checks that it could leave out with new. Better still would be moving to managed allocation, eg. std::unique_ptr<> or std::vector<>, but one step at a time. :)

wled00/set.cpp Outdated Show resolved Hide resolved
@blazoncek
Copy link
Collaborator

I have some more tuning made, ready to push if you allow it @softhack007

  • replaced POD new/delete with malloc/free
  • some more SEGLEN <= 1 (although unnecessary as SEGLEN cannot be 0 if segment is active)
  • some gnu::pure
  • more const attributes
  • some static attributes

@softhack007
Copy link
Collaborator Author

I have some more tuning made, ready to push if you allow it @softhack007

  • replaced POD new/delete with malloc/free
  • some more SEGLEN <= 1 (although unnecessary as SEGLEN cannot be 0 if segment is active)
  • some gnu::pure
  • more const attributes
  • some static attributes

@blazoncek thanks, I love too see how we get (somewhat) exited about code cleanup and improving code quality 😃

Maybe let me first add a commit (later today) to react on feedback, and tomorrow you can push your improvements to my PR? I'm just thinking about keeping things clearer for our poor code reviewers.

@blazoncek
Copy link
Collaborator

Maybe let me first add a commit (later today) to react on feedback

I've implemented most of it already.

@softhack007
Copy link
Collaborator Author

softhack007 commented Jan 9, 2025

just for the fun of it 😉 for the ones here who can recall "Marvin the paranoid android"

AI_making_the_code_better_8c81e90a

"commenting and documenting code may feel like shouting into the void" 🤣

@softhack007
Copy link
Collaborator Author

I've implemented most of it already.

Just give me a chance to finish my part - I may be slower but still good at doing it😉
I'll give you a hint (today) when its time to push your improvements.

* simplified transition bugfix
* removed cast same type
* isIp parameter changed to pass-by-reference, to avoid copy constructor
@softhack007
Copy link
Collaborator Author

Maybe let me first add a commit (later today) to react on feedback

I've implemented most of it already.

@blazoncek I've completed my part, ready to see your improvements 😃 please add them to this PR.

@softhack007
Copy link
Collaborator Author

softhack007 commented Jan 9, 2025

@blazoncek something else you might be able to check:
I got this "comparison operant is a garbage value" (fx_fcn.cpp) warning from sonarcube, but I cannot say if the finding is correct. "Garbage" usually means "from an uninitialized origin".

image

Trace (see the red numbers in the code)
image


What do you think?

blazoncek and others added 5 commits January 9, 2025 13:29
- replaced POD new/delete with malloc/free
- some more SEGLEN <= 1
- some gnu::pure
- more const attributes
- some static attributes
"const" was missing in the function implementation
This is purely a "clean code" thing, no impact on function -  it helps to avoid confusion when reading the code.

C++ allows declaration and implementation to use different variable names.
@blazoncek
Copy link
Collaborator

What do you think?

This line compares if current pin is in fact pin from defPins[] array. (not only value but also address of it, so it will not work on itself)

@blazoncek
Copy link
Collaborator

Question: What is better from a robustness POV, void func1(int * par1) or void func2(int & par1)?
Both change par1 content.

@softhack007
Copy link
Collaborator Author

softhack007 commented Jan 9, 2025

Question: What is better from a robustness POV, void func1(int * par1) or void func2(int & par1)?

Well its almost the same, and might be a matter of personal preference (*par1 vs par1) in most cases.
The only difference is that a reference parameter cannot be nullptr, so this means the function will never receive an invalid pointer.

Edit: Pointer parameters actually make it a bit easier to make a mistake like if (par1 < 1) *par1 = 0; // clamp negatives to 0
This should be if ( par1 && ( *par1 < 1)) .... - with a * - to make it work as intended, and avoid accessing nullptr.

@blazoncek
Copy link
Collaborator

That was exactly what I thought. So if you want to modify parameter it is better to use pass by reference than pass by pointer (even though they are technically the same).
When parameter is POD or even a struct, not a BLOB.

I am thinking about functions in util.cpp that use pointers when in fact should be using reference.

@softhack007 softhack007 changed the title code robustness improvements plus minor speedup code robustness improvements, minor speedup, fx sliders bugfix Jan 16, 2025
@softhack007 softhack007 added this to the 0.15.1 candidate milestone Jan 16, 2025
@netmindz
Copy link
Collaborator

I see we have a few approvals, but not yet merged. What is outstanding or are we good to merge (and cherry pick to the release branch) ?

@blazoncek
Copy link
Collaborator

IMO good to merge.
Further tuning can be done later (a few more const)

wled00/FX_2Dfcn.cpp Outdated Show resolved Hide resolved
wled00/FX_fcn.cpp Outdated Show resolved Hide resolved
wled00/FX_fcn.cpp Outdated Show resolved Hide resolved
wled00/FX.h Outdated Show resolved Hide resolved
Copy link
Collaborator

@DedeHai DedeHai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to fix the speed bumps I mentioned. otherwise good to merge.

* removed unneeded initializations in blur() and blur2D()
* remove check for _t in progress()
* code readability: if (_t) --> if(isInTransition())
* add `isInTransition()` checks to currentBri() and currentMode()
* added missing `_transitionprogress = 0xFFFFU` in stopTransition()
@@ -411,18 +412,18 @@ void Segment::beginDraw() {
_vHeight = virtualHeight();
_vLength = virtualLength();
_segBri = currentBri();
unsigned prog = isInTransition() ? progress() : 0xFFFFU; // transition progress; 0xFFFFU = no transition active
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the check in this line is redundant: handleTransition() is called before beginDraw() which will update progress() to 0xFFFF if isInTransition() is false.
one exception: when streaming RGB data, not sure if progress is possible to be set in that case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is only called once per segment per frame, its ok to leave it like this, it may improve readability.

Copy link
Collaborator Author

@softhack007 softhack007 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exception: when streaming RGB data, not sure if progress is possible to be set in that case.

yeah, streaming pixels is a completely different path - I'd say better to have a bit of redundancy (once per segment frame) and be safe.

@DedeHai
Copy link
Collaborator

DedeHai commented Jan 17, 2025

ready to merge?

@DedeHai
Copy link
Collaborator

DedeHai commented Jan 18, 2025

I ran some tests in conjunction with the PS: I definitely see some improvement! On the ESP32 I barely get any crashes. However, on the single-core C3 I still get very frequent crashes but they are somewhat random.
I chased it down to this:

  • Crashes only happen if segments are changed or FX are changed fast, i.e. while still in transition of the last change
  • PS uses pointers inside of segment data, these pointers must not change while an FX is running (pointers are updated at the beginning of each PS FX call)
  • I suspect that async web requests do exactly that: changing or deleting pointers while the FX is still accessing them
  • crash-dump is inconclusive, I get errors all over the place BUT they are always related to accessing the temporary segment data

To fix this, segment data should only be manipulated in sync with service() and not when a UI request is received. I do not understand the code enough to trace it any further or make improvements. Please correct me if my reasoning is nonsensical.

@DedeHai DedeHai merged commit 566c505 into main Jan 19, 2025
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants