Skip to content

Conversation

@DedeHai
Copy link
Collaborator

@DedeHai DedeHai commented Jul 19, 2025

  • realloc was not handled properly and existing data pointer was discarded when d_realloc() returned null . This fixes the memory leak
  • tracking of _usedSegmentData was also wrong: Segment::addUsedSegmentData(-_dataLen); must not be called if d_malloc() fails, only for realloc.
  • also added intermediate fix for waitForIt(), see 1D->2D expansions can lead to crashes #4779

Summary by CodeRabbit

  • Bug Fixes
    • Improved memory handling to prevent potential memory leaks and ensure accurate tracking of segment data usage.
    • Extended the maximum wait time in certain operations to provide a more reliable timeout margin.
  • New Features
    • Added new error message for low RAM buffer conditions.
    • Enhanced memory allocation with fallback mechanisms to improve stability on various hardware platforms.
  • Chores
    • Increased minimum heap size requirement for web request processing to enhance performance.
    • Updated platform-specific memory allocation logic to better support ESP32 variants.

realloc was not handled properly.
also added *intermediate* fix for waitForIt(), see wled#4779
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 19, 2025

"""

Walkthrough

The changes update memory management in the Segment class and related WS2812FX code by replacing reallocations with free-plus-malloc sequences to reduce fragmentation and handle failures safely, preventing memory leaks and ensuring accurate segment data tracking. PSRAM allocation wrappers were refined for ESP32-C3 exclusion, with new realloc-with-malloc-fallback functions added. The minimum heap size for web requests was increased, and a new error case was added to the frontend error handling.

Changes

File(s) Change Summary
wled00/FX_fcn.cpp Replaced realloc with free-plus-malloc in Segment::setGeometry, Segment::setName, and WS2812FX::finalizeInit; improved failure handling in Segment::allocateData; updated pixel color setting logic in Segment::setPixelColor; extended wait timeout in WS2812FX::waitForIt.
wled00/bus_manager.cpp, wled00/fcn_declare.h Added new realloc-with-malloc-fallback function declarations p_realloc_malloc and d_realloc_malloc for non-ESP8266/ESP32-C3 targets; excluded ESP32-C3 from PSRAM allocation wrappers; added macros mapping to realloc_malloc for ESP8266.
wled00/util.cpp Added implementations of p_realloc_malloc, d_realloc_malloc, and realloc_malloc with fallback malloc on realloc failure; refined PSRAM and DRAM allocation preference logic; excluded ESP32-C3 from PSRAM-specific code.
wled00/const.h Increased MIN_HEAP_SIZE from 2048 to 8192 bytes for web request processing.
wled00/data/index.js Added new error case (code 7) "No RAM for buffer!" in frontend error handling switch.

Estimated code review effort

3 (120 minutes)

Possibly related PRs

Suggested reviewers

  • blazoncek
    """

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ce0cc7 and 716da32.

📒 Files selected for processing (2)
  • wled00/bus_manager.cpp (1 hunks)
  • wled00/fcn_declare.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • wled00/bus_manager.cpp
  • wled00/fcn_declare.h
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@DedeHai DedeHai requested a review from willmmiles July 19, 2025 12:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
wled00/FX_fcn.cpp (1)

1690-1690: Acknowledge the intermediate timeout fix.

The 100ms margin addition provides a safety buffer for the wait timeout. However, as noted in the TODO comment, this appears to be a temporary workaround.

Consider tracking the proper timeout fix mentioned in issue #4779 to ensure this temporary solution gets replaced with a more robust implementation. Would you like me to help create a tracking issue or investigate the root cause of the timeout problem?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7bfd6f and 1960066.

📒 Files selected for processing (1)
  • wled00/FX_fcn.cpp (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: blazoncek
PR: wled/WLED#4667
File: usermods/user_fx/user_fx.cpp:27-30
Timestamp: 2025-04-30T05:41:03.633Z
Learning: In WLED, the Segment::allocateData() method already includes optimization to check if data is allocated and sufficiently sized, handling buffer reuse to reduce memory fragmentation. Adding an external check like `if (SEGENV.data == nullptr && !SEGENV.allocateData(dataSize))` is unnecessary and could be problematic, as it bypasses proper size verification.
Learnt from: willmmiles
PR: wled/WLED#4712
File: wled00/FX_fcn.cpp:992-993
Timestamp: 2025-06-07T15:58:42.579Z
Learning: In WLED segments, the backing pixel buffer is always allocated to the full physical segment size, but effect rendering only uses a subset when mirror or grouping are enabled. For 2D segments, this subset is vWidth() * vHeight(), and for 1D segments it's vLength(). Color processing operations like fade_out should iterate over this rendering subset, not the full allocated buffer.
Learnt from: blazoncek
PR: wled/WLED#4658
File: wled00/led.cpp:90-91
Timestamp: 2025-04-24T09:31:06.879Z
Learning: In the WLED codebase, the `getSegment()` function guards against out-of-bounds segment IDs by falling back to the main segment ID, and `getFirstSelectedSegId()` falls back to `getMainSegmentId()` if no segments are selected, ensuring no crashes when used through the `setValuesFromFirstSelectedSeg()` macro.
Learnt from: blazoncek
PR: wled/WLED#4658
File: wled00/led.cpp:90-91
Timestamp: 2025-04-24T09:31:06.879Z
Learning: In the WLED codebase, the `getSegment()` function guards against out-of-bounds segment IDs, and `getFirstSelectedSegId()` falls back to `getMainSegmentId()` if no segments are selected, ensuring no crashes when used through the `setValuesFromFirstSelectedSeg()` macro.
Learnt from: blazoncek
PR: wled/WLED#4658
File: wled00/led.cpp:90-91
Timestamp: 2025-04-24T09:31:06.879Z
Learning: In the WLED codebase, the `getSegment()` function guards against out-of-bounds segment IDs, and `getFirstSelectedSegId()` falls back to `getMainSegmentId()` if no segments are selected, ensuring no crashes when used through the `setValuesFromFirstSelectedSeg()` macro.
Learnt from: blazoncek
PR: wled/WLED#4482
File: wled00/udp.cpp:343-347
Timestamp: 2025-02-19T12:41:22.676Z
Learning: The setName() method in WLED's Segment class (FX_fcn.cpp) handles all necessary validation internally, including:
1. Null pointer checks
2. Length validation against WLED_MAX_SEGNAME_LEN
3. Memory allocation/reallocation
No additional validation or name clearing is needed before calling setName().
Learnt from: DedeHai
PR: wled/WLED#4750
File: usermods/audioreactive/audio_reactive.cpp:311-321
Timestamp: 2025-06-27T06:04:44.652Z
Learning: In WLED's AudioReactive usermod, FFT buffer memory (valFFT, windowFFT) is allocated once during initialization and never deallocated, following typical embedded system design where core functionality buffers persist for the application lifetime. Raw buffer pointers for aligned allocations don't need to be tracked in the current implementation since deallocation is never performed.
Learnt from: blazoncek
PR: wled/WLED#4482
File: wled00/udp.cpp:147-149
Timestamp: 2025-02-19T12:43:34.199Z
Learning: ESP8266 and ESP32 platforms have different maximum segment name lengths in WLED, which can cause truncation when syncing segment names between devices. This platform difference affects the user experience when using the segment name sync feature.
Learnt from: blazoncek
PR: wled/WLED#4482
File: wled00/udp.cpp:147-149
Timestamp: 2025-02-19T12:43:34.200Z
Learning: In WLED, maximum segment name length varies by platform:
- ESP8266: 32 characters (WLED_MAX_SEGNAME_LEN = 32)
- ESP32: 64 characters (WLED_MAX_SEGNAME_LEN = 64)
This platform difference can cause truncation when syncing longer names from ESP32 to ESP8266. Additionally, the WLED UI has limitations regarding modified maximum segment name lengths.
Learnt from: DedeHai
PR: wled/WLED#4682
File: wled00/FX.cpp:8997-9005
Timestamp: 2025-05-09T18:43:15.355Z
Learning: In the WLED codebase, SEGMENT.custom3 is always constrained to the range 0-31 and will not exceed this range.
wled00/FX_fcn.cpp (11)
Learnt from: blazoncek
PR: wled/WLED#4667
File: usermods/user_fx/user_fx.cpp:27-30
Timestamp: 2025-04-30T05:41:03.633Z
Learning: In WLED, the Segment::allocateData() method already includes optimization to check if data is allocated and sufficiently sized, handling buffer reuse to reduce memory fragmentation. Adding an external check like `if (SEGENV.data == nullptr && !SEGENV.allocateData(dataSize))` is unnecessary and could be problematic, as it bypasses proper size verification.
Learnt from: willmmiles
PR: wled/WLED#4712
File: wled00/FX_fcn.cpp:992-993
Timestamp: 2025-06-07T15:58:42.579Z
Learning: In WLED segments, the backing pixel buffer is always allocated to the full physical segment size, but effect rendering only uses a subset when mirror or grouping are enabled. For 2D segments, this subset is vWidth() * vHeight(), and for 1D segments it's vLength(). Color processing operations like fade_out should iterate over this rendering subset, not the full allocated buffer.
Learnt from: blazoncek
PR: wled/WLED#4482
File: wled00/udp.cpp:343-347
Timestamp: 2025-02-19T12:41:22.676Z
Learning: The setName() method in WLED's Segment class (FX_fcn.cpp) handles all necessary validation internally, including:
1. Null pointer checks
2. Length validation against WLED_MAX_SEGNAME_LEN
3. Memory allocation/reallocation
No additional validation or name clearing is needed before calling setName().
Learnt from: DedeHai
PR: wled/WLED#4750
File: usermods/audioreactive/audio_reactive.cpp:311-321
Timestamp: 2025-06-27T06:04:44.652Z
Learning: In WLED's AudioReactive usermod, FFT buffer memory (valFFT, windowFFT) is allocated once during initialization and never deallocated, following typical embedded system design where core functionality buffers persist for the application lifetime. Raw buffer pointers for aligned allocations don't need to be tracked in the current implementation since deallocation is never performed.
Learnt from: blazoncek
PR: wled/WLED#4658
File: wled00/led.cpp:90-91
Timestamp: 2025-04-24T09:31:06.879Z
Learning: In the WLED codebase, the `getSegment()` function guards against out-of-bounds segment IDs, and `getFirstSelectedSegId()` falls back to `getMainSegmentId()` if no segments are selected, ensuring no crashes when used through the `setValuesFromFirstSelectedSeg()` macro.
Learnt from: blazoncek
PR: wled/WLED#4658
File: wled00/led.cpp:90-91
Timestamp: 2025-04-24T09:31:06.879Z
Learning: In the WLED codebase, the `getSegment()` function guards against out-of-bounds segment IDs, and `getFirstSelectedSegId()` falls back to `getMainSegmentId()` if no segments are selected, ensuring no crashes when used through the `setValuesFromFirstSelectedSeg()` macro.
Learnt from: blazoncek
PR: wled/WLED#4658
File: wled00/led.cpp:90-91
Timestamp: 2025-04-24T09:31:06.879Z
Learning: In the WLED codebase, the `getSegment()` function guards against out-of-bounds segment IDs by falling back to the main segment ID, and `getFirstSelectedSegId()` falls back to `getMainSegmentId()` if no segments are selected, ensuring no crashes when used through the `setValuesFromFirstSelectedSeg()` macro.
Learnt from: blazoncek
PR: wled/WLED#4482
File: wled00/udp.cpp:147-149
Timestamp: 2025-02-19T12:43:34.200Z
Learning: In WLED, maximum segment name length varies by platform:
- ESP8266: 32 characters (WLED_MAX_SEGNAME_LEN = 32)
- ESP32: 64 characters (WLED_MAX_SEGNAME_LEN = 64)
This platform difference can cause truncation when syncing longer names from ESP32 to ESP8266. Additionally, the WLED UI has limitations regarding modified maximum segment name lengths.
Learnt from: blazoncek
PR: wled/WLED#4482
File: wled00/udp.cpp:147-149
Timestamp: 2025-02-19T12:43:34.199Z
Learning: ESP8266 and ESP32 platforms have different maximum segment name lengths in WLED, which can cause truncation when syncing segment names between devices. This platform difference affects the user experience when using the segment name sync feature.
Learnt from: netmindz
PR: wled/WLED#4728
File: wled00/FX.h:378-378
Timestamp: 2025-06-15T09:59:52.720Z
Learning: In WLED's FX.h, MODE_COUNT represents the highest FX_MODE_ ID + 1, not the total count of FX_MODE_ definitions. This is because some effects may have been removed but their IDs are not reused, creating gaps in the numbering sequence. MODE_COUNT is used for array bounds and range checking.
Learnt from: netmindz
PR: wled/WLED#4728
File: wled00/FX.h:378-378
Timestamp: 2025-06-15T09:59:52.720Z
Learning: In WLED's FX.h, MODE_COUNT represents the highest FX_MODE_ ID + 1, not the total count of FX_MODE_ definitions. This is because some effects may have been removed but their IDs are not reused, creating gaps in the numbering sequence. MODE_COUNT is used for array bounds and range checking.
🔇 Additional comments (1)
wled00/FX_fcn.cpp (1)

162-174: Excellent memory management fix!

This change properly handles realloc() failure scenarios and prevents memory leaks. The implementation correctly:

  • Uses a temporary pointer to store d_realloc() result
  • Preserves the original pointer if reallocation fails
  • Properly frees old memory and updates tracking variables on failure
  • Maintains accurate _usedSegmentData accounting

This addresses the critical issue described in the PR where the original data pointer was lost if d_realloc() returned null.

Copy link
Member

@willmmiles willmmiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. No objections to a stopgap for #4779.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 19, 2025

thanks for reviewing. I found a few more bugs. will add them to this PR.

Edit:
I also updated the realloc code: realloc can lead to fragmentation that hardly clears up. I updated the realloc to only be allowed if it does it in-place. if the returned pointer changes, free all memory and allocate a new block, hopefully filling up gaps. I saw some improvement in fragmentation that way but it may be completely arbitraray.
also, there are more places that use realloc() incorrectly. fixed those too.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 19, 2025

I am testing ESP32 for larger setups with the latest code.
I see one rather big issue:
if I set it up as 4panels, 64x64pxels in total, the output works but the webu UI does not. Free heap is 58k but largest available block ist only 8k, web UI is stuck loading.
@willmmiles is this something you already looked into? IMHO having working LEDs and no UI is bad, it means users have to reset the device just because the LED buffers hog all memory and starve the UI.
Should the minimum block size be monitored (free heap is kind of irrelevant) and action taken (what?) if its too small?

found one of the culprits leading to crashes in 1D->2D expansions. Corner MUST be boundary checked as it blindly writes the max dimension, crashing anything non quadratic immediately.
also removed realloc() in other places, improving fragmentation.
@willmmiles
Copy link
Member

IIRC, the web server core doesn't typically do any allocations bigger than a single packet (1500 bytes). AsyncJSONResponse serializes on a packet-by-packet basis; it doesn't need a contiguous block either, other than the globally allocated JSON buffer.

Do you get HTTP 500s from specific requests? The web server core monitors free space before accepting each call.

void* newData = d_realloc(data, len); // note: realloc returns null if it fails but does not free the original pointer
if (!newData || newData != data) // realloc failed or used a new memory block causing fragmentation. free and allocate new block
d_free(newData);
d_free(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to free the old buffer, realloc() has already free()d it for you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unless it returned null)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but only if it does not fail, i.e. if newData is valid. Or ist that incorrect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

But if realloc did allocate a new buffer for you, there's no point in releasing it and trying again; malloc could just give you the new buffer back, since there wasn't enough space to extend the old one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, I had an error in my thought process thinking that if a new segment is created and the old data copied it would help fragmentation if deleting all and allocating new but that is exactly what realloc() actually does.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 19, 2025

Do you get HTTP 500s from specific requests? The web server core monitors free space before accepting each call.

I did not investigate what exactly happens. I looked more into why heap is fragmented so much. Sometimes I do get 503, other times its just stuck on loading without error in the browser.

@willmmiles
Copy link
Member

Do you get HTTP 500s from specific requests? The web server core monitors free space before accepting each call.

I did not investigate what exactly happens. I looked more into why heap is fragmented so much. Sometimes I do get 503, other times its just stuck on loading without error in the browser.

Under extreme heap pressure it will just drop the connection entirely. It needs to be pretty extreme though (biggest block <2048, free heap < 8192).

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 19, 2025

I have the UI failing at ~8k smallest block and 70k free heap. so must be something else at play.

I am actually working on PS memory management, these are just things I picked up along the way and had a play and some printouts just to make sure its not the PS causing it. It may still be at fault but its pretty hard to say as I am debugging on several fronts...

@willmmiles
Copy link
Member

I have the UI failing at ~8k smallest block and 70k free heap. so must be something else at play.

There are a lot of transient allocations in the TCP stack. The view from the web server might be much tighter than elsewhere in the code. You can try building with -D ASYNCWEBSERVER_DEBUG_TRACE to get it to print out what it thinks is going on.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 19, 2025

This is what I get. 64x32 2D matrix, ESP32, selecting some RAM heavy PS FX:

[59876]{71428}MultiMessage 1073677576 - 0 (0/0/3076 - 0/0)
[59922]{74296}BasicMessage 1073701280 - 0 (0/0/16 - 0/0)
[61465]{79756}MultiMessage 1073677452 - 0 (0/0/1931 - 0/0)
[63144]{51236}MultiMessage 1073701280 - 0 (0/0/1926 - 0/0)
[66265]*** Rejecting client 3FFF5700 (52763): 0, 6132/8976
[66267]*** Sent 503 to 3FFF5700 (52763), result 53
[66322]*** Client 3FFF5700 (0) disconnected
[66847]*** Rejecting client 3FFF5F70 (52764): 0, 6132/8856
[66849]*** Sent 503 to 3FFF5F70 (52764), result 53
[66904]*** Client 3FFF5F70 (0) disconnected
[85115]*** Rejecting client 3FFF58CC (52777): 0, 6132/8600
[85117]*** Sent 503 to 3FFF58CC (52777), result 53
[85162]*** Client 3FFF58CC (0) disconnected
[88268]*** Rejecting client 3FFF5F74 (52778): 0, 6132/8376
[88270]*** Sent 503 to 3FFF5F74 (52778), result 53
[88323]*** Client 3FFF5F74 (0) disconnected
[89442]*** Rejecting client 3FFF58CC (52779): 0, 6132/8624
[89444]*** Sent 503 to 3FFF58CC (52779), result 53
[89488]*** Client 3FFF58CC (0) disconnected
[127760]{60340}(3fff58cc) WR created
[127767]{58052}(3fff58cc) WR handler ready /ws
[127767]Queue: 1 entries, 0 running, 1 queued
[127767]{58052}(3fff58cc) WR handler running
[127780]{57356}(3fff58cc) WR added response 3ffff580
[127846]{56268}MultiMessage 1073677576 - 0 (0/0/1924 - 0/0)
[127848]{50764}(3fff58cc) WR destructing
[127849]Removing 3FFF58CC from queue
[127849]Queue: 0 entries, 0 running, 0 queued
[127860]{57884}(3fff58cc) WR destructed
[136632]{58328}MultiMessage 1073677576 - 0 (0/0/1924 - 0/0)
[140525]*** Rejecting client 3FFFF690 (52801): 0, 8180/11608
[140527]*** Sent 503 to 3FFFF690 (52801), result 53
[140528]*** Rejecting client 3FFFF768 (52802): 0, 8180/10704
[140579]*** Client 3FFFF690 (0) disconnected

Last FX is running happily, just UI is dead.

Edit:
after some timeout it starts working again:

[152988]*** Client 3FFF58CC (0) disconnected
[201604]*** Sent 503 to 3FFF60C0 (52812), result 53
[201605]*** Rejecting client 3FFF561C (52827): 0, 8180/11132
[201649]*** Client 3FFF60C0 (0) disconnected
[305258]*** Client 3FFF561C (0) disconnected
[351877]{60216}(3fff5e70) WR created
[351882]{58956}(3fff5e70) WR handler ready /
[351882]Queue: 1 entries, 0 running, 1 queued
[351882]{58956}(3fff5e70) WR handler running
[351907]{58240}(3fff5e70) WR added response 3fff59a4
[351908](3fff59a4) ack 0
[351911](3fff59a4)AL1436 0
[351912]*** Rejecting client 3FFF58CC (52909): 1, 5620/8720
[351914](3fff59a4) ack 1713
[351915](3fff59a4)AL1436 0
[351970](3fff59a4) ack 1436

after changing FX a few times (even non-PS FX) UI breaks once more.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 19, 2025

got the other error again: it just stops responding, largest free block is only 1.2k at this point:

[284446]{73184}MultiMessage 1073599392 - 0 (0/0/1939 - 0/0)
[285598]{72920}MultiMessage 1073599988 - 0 (0/0/1938 - 0/0)
[287015]{73152}MultiMessage 1073599424 - 0 (0/0/1939 - 0/0)
Free heap: 75220
Largest free block: 16372 bytes
[288430]{57812}MultiMessage 1073599424 - 0 (0/0/1934 - 0/0)  ---> stopped responding completely here
Free heap: 54264
Largest free block: 1204 bytes

after this, largest free block increases to 4k but UI never recovers.

@willmmiles
Copy link
Member

[351912]*** Rejecting client 3FFF58CC (52909): 1, 5620/8720

Sorry the output is somewhat inscrutable. The values are pointer to the AsyncClient object, the remote port number, the number of web requests queued, the largest free RAM block, and the available heap.

So yeah, it's just out of memory. :(

if (length() != oldLength) {
if (pixels) pixels = static_cast<uint32_t*>(d_realloc(pixels, sizeof(uint32_t) * length()));
else pixels = static_cast<uint32_t*>(d_malloc(sizeof(uint32_t) * length()));
if (pixels) d_free(pixels); //pixels = static_cast<uint32_t*>(d_realloc(pixels, sizeof(uint32_t) * length())); // using realloc can cause additional fragmentation instead of reducing it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a misleading comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is what I have seen in tests: when using realloc, fragmentation was worse. When using free/malloc it was somewhat better but both approaches do cause fragmentation. The idea behind this is that free + malloc allows it to fill gaps left behind.
Got any test scenario that is good to test which works better in practice?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sole purpose of realloc() functions is to reduce fragmentation. free + malloc will only fill gaps if garbage collector did its job. Unfortunately I have no knowledge of MM on ESP platforms.

IIRC @softhack007 was having a lot of issues with memory fragmentation in the past but I no longer remember what was the outcome (except that FX memory is no longer released and it only grows once allocated).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is zero garbage collection, its first come, first serve from the bottom up.
so if a block of memory divides a larger block, free -> malloc should move it to the bottom, realloc will keep it there if it fits.
Using realloc can prevent fragmentation but it can also make it worse if large buffers are at play. At least that is my understanding and also what I have seen in tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then go a step further and simplify p_realloc and d_realloc to use p_free/p_malloc and d_free/d_malloc instead.
It is the cleanest solution.
However ESP8266 will need the same.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 20, 2025

So yeah, it's just out of memory. :(

memory is a bit of an issue currently. If I set up: 64x64 using 8 Panels, 512 LEDs each and have a bootup preset = solid the UI already does not work anymore. there is "plenty" of free heap (60k) but largest block is too small it seems (around 10k).
The limit for a 2D setup without PSRAM seems to be around 2000 LEDs to work somewhat reliably.
Question is how to handle it, IMHO a broken UI is inacceptable.

@blazoncek
Copy link
Collaborator

@DedeHai this was a great find and a poor performance on my side. I should've thought this better.

However I would prefer to fix the issue at its core, like so:

void *p_realloc(void *ptr, size_t size) {
  int caps1 = MALLOC_CAP_SPIRAM  | MALLOC_CAP_8BIT;
  int caps2 = MALLOC_CAP_DEFAULT | MALLOC_CAP_8BIT;
  void *newbuf = nullptr;
  if (psramSafe) {
    if (heap_caps_get_free_size(caps2) > 3*MIN_HEAP_SIZE && size < 512) std::swap(caps1, caps2);  // use DRAM for small alloactions & when heap is plenty
    newbuf = heap_caps_realloc_prefer(ptr, size, 2, caps1, caps2); // otherwise prefer PSRAM if it exists
    if (newbuf) return newbuf; // realloc successful
    else {
      p_free(ptr); // free old buffer if realloc failed (to keep consumer allocation logic simple)
      return p_malloc(size); // fallback to malloc if realloc failed (buffer will not be copied!!!)
    }
  }
  newbuf = heap_caps_realloc(ptr, size, caps2); // fallback to default realloc
  if (newbuf) return newbuf; // realloc successful
  else {
    p_free(ptr); // free old buffer if realloc failed
    return heap_caps_malloc(size, caps2); // fallback to malloc if realloc failed
  }
}

I do agree that this hides the issue of failed reallocations downstream but IMO that is an acceptable tradeoff as the code is simpler.

@blazoncek
Copy link
Collaborator

Question is how to handle it, IMHO a broken UI is inacceptable.

Indeed it is. 😁 Inventing the logic to switch between DRAM and PSRAM is the key. And if the PSRAM is fast enough for LED and effect data.
When I last dealt with memory allocations (about a year and half ago) AWS was eating 80kB on start and needed additional 50kB for runtime. LED, segment and FX data would eat as much.

blazoncek added a commit to blazoncek/WLED that referenced this pull request Jul 20, 2025
- realloc does not free original buffer on fail
- see also wled#4783
- align DRAM alloc to 32bit read (experimental)
@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 20, 2025

@blazoncek can you explain the intent of this function?

void *d_malloc(size_t size) {
  int caps1 = MALLOC_CAP_DEFAULT | MALLOC_CAP_8BIT;
  int caps2 = MALLOC_CAP_SPIRAM  | MALLOC_CAP_8BIT;
  if (psramSafe) {
    if (size > MIN_HEAP_SIZE) std::swap(caps1, caps2);  // prefer PSRAM for large alloactions
    return heap_caps_malloc_prefer(size, 2, caps1, caps2); // otherwise prefer DRAM
  }
  return heap_caps_malloc(size, caps1);
}

is the idea to put all rendering buffers in PSRAM, even if there is plenty of DRAM available? What am I missing?

@willmmiles
Copy link
Member

FWIW, deferring the mapping to the last step in the render process - as we now do with blending - should make things more performant when using PSRAM. PSRAM on ESP32 has a cache in SRAM which is paged in and out: so when you hit a PSRAM address, that page gets swapped in, and then "nearby" accesses are just as fast as any other SRAM until the page needs to be swapped out again. So if you're doing a lot of linear accesses (ie. start to end, in row order) it's pretty fast; if you're doing a lot of random or non-linear accesses (such as when applying ledmaps, inversions, mirroring, 2d expansion, etc.) it can be really slow paging in and out all the time. The more of the mapping logic that can be done later, the faster it will go.

@blazoncek
Copy link
Collaborator

d_ should prefer DRAM while p_ should prefer PSRAM.
But.
If allocations are small p_ will prefer DRAM, if allocation is huge d_ will prefer PSRAM.

Assuming PSRAM is available.

@blazoncek
Copy link
Collaborator

PSRAM on ESP32 has a cache in SRAM

Is that true for rev.1 and rev.3 ESPs? @softhack007 had very bad results with PSRAM on classic ESP32 in the past. I also have a test units rev.1 and rev.3 so can compare when time permits.

@willmmiles
Copy link
Member

PSRAM on ESP32 has a cache in SRAM

Is that true for rev.1 and rev.3 ESPs? @softhack007 had very bad results with PSRAM on classic ESP32 in the past. I also have a test units rev.1 and rev.3 so can compare when time permits.

I'll review it in more detail, but as far as I could tell, it's fundamental to the memory architecture -- there isn't any way for the CPU to directly address PSRAM, it all goes through the MMU and is backed by some SRAM. MMU bugs in those accesses are entirely possible, though.

@willmmiles
Copy link
Member

Especially MMU and paging issues with dual-core systems -- there seem to be a lot of SMP coherency-related fixes and workarounds in the ESP-IDF source code...

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 20, 2025

d_ should prefer DRAM while p_ should prefer PSRAM. But. If allocations are small p_ will prefer DRAM, if allocation is huge d_ will prefer PSRAM.

huge as in larger than MIN_HEAP_SIZE (which is only 2k)?

@blazoncek
Copy link
Collaborator

huge as in larger than MIN_HEAP_SIZE

Whatever the code says. 😄 BTW at the time of writing MIN_HEAP_SIZE was 8k.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 20, 2025

8k min heap makes a lot more sense :) at least for ESP32's.
I still think that if you have 100k of DRAM available, that should be used and only start consider using PSRAM if DRAM starts running low. OR as Will suggested: for buffers that have mostly sequential access, like the "global" _pixels buffer.

webUI / webserver issue: I ran some testing on ESP32: as you can see in the logs above, at 8k of heap (actual usable heap) its already very unstable. @willmmiles the reason the webUI does not work right is closely tied to available heap (obviously) but once it breaks its broken for good: I can get it to half-load the UI, throwing some "json abruptly ends" kind of errors, once the heap recovers to over 20k with a 10k contiguous part it still stays broken until reboot. So keeping a minimum heap of about 8k makes sense but something still breaks the UI. 2k min heap is uselsess. Also: on ESP32 esp.getfreeheap() will report all heap, including IRAM, that is why in the printout above it shows 70k heap while in reality, only about 15k of that was useable.
I will provide more fixes/improvents in a new PR as those are more "experimental".
It would be good to find out why the UI is becoming unstable and especially why it does not recover from a "low memory" event and/or how to detect that and re-initialize.

@blazoncek I found one more subtle bug I could not pinpoint: when transitions are enabled but allocation of pixel buffer fails, FX are still executed with SEGLEN equal to zero, causing crashes in FX that divide by that: "Android" is one of them.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 20, 2025

@willmmiles I see you changed the MIN_HEAP_SIZE to 2k in your async webserver update. at the same time WLED_REQUEST_HEAP_USAGE is set to 12k. Does that mean you expect any request to not be processed correctly below 12k of heap? If so, the MIN_HEAP_SIZE should be larger than that no?

@blazoncek
Copy link
Collaborator

I found one more subtle bug I could not pinpoint

Not a bug of segment logic per-se. _vLength will be 0 if segment is not active (i.e. stop==0).

If a request for segment's pixel buffer fails (during start of transition), a copy of "old" segment is made, however its stop will be 0 making it inactive. Since service() doesn't care if segment's copy is inactive, it runs the effect (it does care for base segment).

We can add if (segO->isActive()) check prior to running old segment's effect. The effect will be that transition will not run correctly.
Another option is to nullify transition (or at least remove old segment from transition similar effect as above) in such case. A bit more complex.

blazoncek added a commit to blazoncek/WLED that referenced this pull request Jul 20, 2025
- stop transition
- add aditional check in service()
- see discussion in wled#4783
@willmmiles
Copy link
Member

I'm aware there are still unresolved issues with surviving heap exhaustion in the web server stack. I haven't gone all the way to the bottom yet - I believe there are some problems even in the wifi drivers themselves where they can hang up connections, leaking packet buffers, if there's an allocation failure on some critical path. I'd tried to track down as many places as possible and make sure things are handled gracefully, but I haven't managed to get them all yet. Sorry - still some work to do there. :(

The heap availability checks in AsyncWebServerWLED do use MALLOC_CAPS to check only for memory that malloc() or new can access - I'd forgotten about this, it might be why you see different numbers elsewhere than the web server reports.

There are actually several layers of heap checks in the web server:

  • an absolute hard minimum (8kb total, 2kb block, on ESP32) where it will just abort and drop the connection. This particular threshold was arrived at empirically - once it hits that level, there typically isn't enough RAM left for the LwIP stack to serve a 503. (LwIP in ESP32 always allocates a full packet for any outbound write, so it needs a contiguous block of ~1600 bytes.) If this happens, the debug trace will give you a "Dropping client" message.
  • the "minimum heap to queue a request" value: if there's at least that much heap, it will put the request on a queue to service later. If this threshold is hit, you get a "Rejecting client" message. In WLED this is set by WLED_REQUEST_MIN_HEAP.
  • the "minimum heap to serve a request" value; if there's at least that much heap, the server will dequeue another request and attempt to handle it. (And it will always try to serve at least one request at a time, as long as the hard minimum is met, because otherwise heap exhaustion would leave it entirely dead to the world.) The idea is that under memory pressure we can often queue a couple of requests to handle later, even if we can't afford the memory to handle them in parallel. In WLED this threshold is set with WLED_REQUEST_HEAP_USAGE.

DedeHai added 2 commits July 20, 2025 22:25
- ESP32 C3 has no PSRAM, it now uses default alloc functions
- also added missing UI info for "Error 7"
@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 21, 2025

Update:

  • the added realloc_malloc() functions are wrappers to try a realloc including freeing original pointer with malloc fallback as suggested. Changing the base realloc is not a good idea as it does not behave like realloc() and can lead to future bugs if it is expected to do so.
  • changed the d_malloc/realloc functions to prefer DRAM up to the point where the largest contiguous block gets small
  • changed MIN_HEAP_SIZE back to 8k
  • Also added UI info printout on error 7 (render buffer allocation failed)

from my side this is ready to merge.

@blazoncek
Copy link
Collaborator

from my side this is ready to merge

Why not also add a fix for inactive "old segment"? All it needs is add && segO->isActive() in service() (L: 1240).

SEGLEN equal to zero, causing crashes in FX that divide by that

It will solve this.

BTW I also noticed (while poking in my fork) that there are no safeguards if WS2812FX::_pixels fails to allocate (very unlikely though). There are also several places that would benefit a deallocateData() if segment becomes inactive.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 21, 2025

Why not also add a fix for inactive "old segment"? All it needs is add && segO->isActive() in service() (L: 1240).

that should be a seperate PR, I already wrote it down in my notes.

There are also several places that would benefit a deallocateData() if segment becomes inactive.

I also have a fix for that: release FX data if ram runs low. PS can allocate quite a lot of FX memory which should be released but not if there is no shortage of ram.

edit:
working on some ram allocation tricks RN, the API is a mess though...

@netmindz
Copy link
Member

Very happy to see some optimisations being done for better memory management for WLED.

Not something for this PR, but at some point I think it would be good to a bit of a review of PSRAM usage overall as especially with octal, we could make greater use of it and in general I'm a big fan of providing stability where we can - i.e better to perhaps drop the frame rate than be unresponsive in the UI or reboot rather than avoiding SPRAM due to speed concerns

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 21, 2025

i.e better to perhaps drop the frame rate than be unresponsive in the UI or reboot rather than avoiding SPRAM due to speed concerns

the new alloc functions should take care of that, parameters could be fine-tuned a bit more for specific use-cases but the general approach of @blazoncek d_alloc/p_alloc is sound.
Ran a few tests for RAM availability, as usual espressif docs are a bit vague on that to say the least but I found some secred stashes for C3 and ESP32 we can tap into, S2 and S3 do not have those but they have plenty of SRAM packed.

}

void *d_calloc(size_t count, size_t size) {
int caps1 = MALLOC_CAP_DEFAULT | MALLOC_CAP_8BIT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't new in this commit, but one thing to be aware of is that MALLOC_CAP_DEFAULT by itself doesn't mean "use internal memory" -- it means "use the memory pool that malloc() draws from". This can matter if PSRAM is available to malloc(), either because the IDF was built with CONFIG_SPIRAM_USE_MALLOC, or if heap_caps_malloc_extmem_enable() was called; both allow this to return a PSRAM buffer. This isn't necessarily wrong per se -- certainly p_*alloc() above doesn't care -- but it might be worth thinking about here in the d_*alloc() functions.

If we want to insist on using internal memory, use MALLOC_CAP_INTERNAL. If we want to limit it to the pools that malloc() can also reach, use MALLOC_CAP_INTERNAL | MALLOC_CAP_DEFAULT. As you've noticed, there are some internal SRAM pools that malloc() won't touch, such as the block of shared data/instruction RAM intended for dynamically loadable or self-modifying code; and when SPI RAM is enabled, the IDF configuration can reserve some internal memory in a separate non-malloc()-accessible pool for DMA buffers. Using only MALLOC_CAP_INTERNAL will permit usage of those pools, if we want to.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By all means! I set up the framework, more skilled people should tune it. 😉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using only MALLOC_CAP_INTERNAL will permit usage of those pools, if we want to.

you beat me by a few hours, I ran tests on exactly this and already prepared a few new function calls that work as intended. Conclusion of my endevours in a nutshell:
S3, S2 and C3 all have IRAM and DRAM as well as RTCRAM enabled to be used as heap. I initially thought they do not but turns out that was a false test.
ESP32 however does not and can not BUT: as we are using all 32bit colors now, we can tap into IRAM which is 32bit accessible only. I tested that and it works, just need to run some more tests and make sure the buffers NEVER get accessed in any other way. It will give us about 50k of extra internal RAM (until more functions are put in IRAM that is).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By all means! I set up the framework, more skilled people should tune it. 😉

Now if you really want to get fancy, you can play fun and games with the MMU to coalesce fragmented blocks to a region of contiguous virtual address space so they look and feel like a flat buffer ... ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now if you really want to get fancy, you can play fun and games with the MMU to coalesce fragmented blocks to a region of contiguous virtual address space so they look and feel like a flat buffer ... ;)

Eh, scrub that. The MMU implementation isn't general enough to be useful for this purpose. It looks like it's meant to be used for task stacks and flash/SPIRAM caching only, pretty much.

Copy link
Collaborator

@blazoncek blazoncek Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to insist on using internal memory, use MALLOC_CAP_INTERNAL

Causes bootloop when allocating (and initialising) ledmap on ESP32.
EDIT: ... when paired with MALLOC_CAP_32BIT.

All is well.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 22, 2025

Any objectons to merge this as-is and then build upon it instead of polishing what was supposed to be a quick bugfix?

we can continue the discussion in a follow-up PR with improvements to allocation handling.

@blazoncek
Copy link
Collaborator

Why not also add a fix for inactive "old segment"? All it needs is add && segO->isActive() in service() (L: 1240).

that should be a seperate PR, I already wrote it down in my notes.

Why? It is a FX allocation bugfix.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 22, 2025

Why not also add a fix for inactive "old segment"? All it needs is add && segO->isActive() in service() (L: 1240).

that should be a seperate PR, I already wrote it down in my notes.

Why? It is a FX allocation bugfix.

to me it is a different issue, related to segment buffer and has different implications I need to look into and test.

@willmmiles
Copy link
Member

+1 for merge now. I've been looking at locking/task safety for #4779 -- I think one of the sub-tasks on the way there will be to make Segment work with default move and copy constructors. That basically translates to wrapping Segment::name, Segment::data, and Segment::pixels in managed-memory structures. So the sooner we merge these changes, the less merge conflict resolution I'll have to deal with ... :)

@DedeHai DedeHai merged commit e2f5bec into wled:main Jul 23, 2025
21 checks passed
@blazoncek
Copy link
Collaborator

  • also added intermediate fix for waitForIt()

So why then this? It does not belong in this PR then.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jul 23, 2025

  • also added intermediate fix for waitForIt()

So why then this? It does not belong in this PR then.

you are correct, however I already tested that along the way. my comment was less about mixing up PRs and more about getting this one finished.
As mentioned I have some follow-up improvements that need this one merged and its easier to work in increments than having to maintain changes in different branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants