Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack usage during serialization is significantly higher than v6 #2046

Closed
willmmiles opened this issue Feb 1, 2024 · 21 comments
Closed

Stack usage during serialization is significantly higher than v6 #2046

willmmiles opened this issue Feb 1, 2024 · 21 comments
Labels
bug v7 ArduinoJson 7

Comments

@willmmiles
Copy link

willmmiles commented Feb 1, 2024

Describe the bug

Serialization using ArduinoJSON v7 on ESP8266 seems to be using substantially more stack than v6 for common serialization tasks. With a large configuration structure, this can result in a stack overflow, causing difficult to diagnose crashes.

Troubleshooter report
Here is the report generated by the ArduinoJson Troubleshooter:

  1. The program uses ArduinoJson 7
  2. The issue happens at run time
  3. The issue concerns serialization
  4. Program crashes
  5. Program uses PROGMEM
  6. Casting the pointer doesn't solve the issue

Environment
Here is the environment that I used:

  • Microcontroller: ESP8266
  • Core/runtime: ESP8266 core for Arduino v3.1.2 (PlatformIO espressif8266@4.2.1)
  • IDE: VSCode/PlatformIO

Reproduction

[TODO] Working on it - I should be able to provide a better (complete and buildable) example in a few days. Serialization code is of the form:

void serializeConfig() {
  JsonDocument doc;
  JsonObject root = doc.to<JsonObject>();

  JsonArray rev = root[F("rev")].to<JsonArray>();
  rev.add(1); //major settings revision
  rev.add(0); //minor settings revision

  root[F("vid")] = VERSION;

  JsonObject id = root[F("id")].to<JsonObject>();
  id[F("mdns")] = cmDNS;
  id[F("name")] = serverDescription;
  id[F("sui")] = simplifiedUI;
  // and so on ...

  File f = LittleFS.open("/cfg-v7.json", "w");
  if (f) serializeJson(root, f);
  f.close();
}

void measureStackUsage() {
    cont_check(g_pcont);   // Validate we haven't already overflowed
    cont_repaint_stack(g_pcont);  // Fill unused stack with guard value

    auto unused_stack_before = cont_get_free_stack(g_pcont);    // Counts stack filled with guard value
    serializeConfig();

    cont_check(g_pcont);   // Validate we haven't overflowed
    auto unused_stack_after = cont_get_free_stack(g_pcont);

    Serial.printf("Stack used by serializeConfig: %d\n", unused_stack_after - unused_stack_before);
}
@willmmiles willmmiles added the bug label Feb 1, 2024
@willmmiles
Copy link
Author

I tried using ghidra to inspect the stack frames. It looked to me that the difference compared to v6 seems to be a combination of both larger sizes for JsonObject/JsonArray, and a plethora of FlashString objects left on the stack. It seems gcc isn't able to recognize that the FlashStrings are temporary and the stack can be recovered. Passing -fstack-reuse=all did not help.

@willmmiles
Copy link
Author

As a workaround on ESP8266es, increasing the stack size with -D CONT_STACKSIZE=8192 worked around the crashes in my project, though this reduces the stack size available to the system context and it's unclear if that will cause trouble with system functionality later. RTOS-style runtimes with small task stacks (such as ESP32s) might also be impacted.

bblanchon added a commit that referenced this issue Feb 1, 2024
This reduces stack consumption and code size.
See  #2046
@bblanchon
Copy link
Owner

Hi @willmmiles,

Thank you very much for reporting this issue.

I turn your repro code into the following program:

#include <Arduino.h>
#include <ArduinoJson.h>
#include <cont.h>

void serializeConfig() {
  DynamicJsonDocument doc(1024);

  JsonArray rev = doc[F("rev")].to<JsonArray>();
  rev.add(1);
  rev.add(0);

  doc[F("vid")] = ARDUINOJSON_VERSION;

  JsonObject id = doc[F("id")].to<JsonObject>();
  id[F("mdns")] = "cmDNS";
  id[F("name")] = "serverDescription";
  id[F("sui")] = "simplifiedUI";

  serializeJson(doc, Serial);
}

void setup() {
  Serial.begin(9600);
  while (!Serial)
    continue;

  cont_check(g_pcont);         // Validate we haven't already overflowed
  cont_repaint_stack(g_pcont); // Fill unused stack with guard value

  auto unused_stack_before =
      cont_get_free_stack(g_pcont); // Counts stack filled with guard value
  serializeConfig();

  cont_check(g_pcont); // Validate we haven't overflowed
  auto unused_stack_after = cont_get_free_stack(g_pcont);

  Serial.printf("Stack used by serializeConfig: %d\n",
                unused_stack_before - unused_stack_after);
}

void loop() {}

With it, I got the following results:

ArduinoJson Arduino IDE PlatformIO Release PlatformIO Debug
v6.21.5 532 500 960
v7.0.2 580 548 1184
c98b05e 580 548 1120

We can see an increase, but it's not that big.

I also did some tests by modifying the library to print the stack pointer address and got the following results for six levels of nesting:

ArduinoJson Arduino IDE PlatformIO Release PlatformIO Debug
v6.21.5 256 240 688
v7.0.2 368 256 1312
c98b05e 256 192 1008

We can see a significant increase, but only in debug mode.
Are you having problems in debug or release?

c98b05e corresponds to the current head of branch 7.x.
Can you give it a try?

Best regards,
Benoit

@bblanchon bblanchon changed the title v7: Stack usage during serialization is significantly higher than v6 Stack usage during serialization is significantly higher than v6 Feb 1, 2024
@bblanchon bblanchon added the v7 ArduinoJson 7 label Feb 1, 2024
@willmmiles
Copy link
Author

Thanks for taking such a detailed look, I really appreciate it.

I'm using release builds, and unfortunately c98b05e made no significant difference for the whole config. I finally instrumented my project properly using the logic from above, with the full config, and got the following results:

ArduinoJson Stack Usage
v6.21.5 2308
v7.0.2 4948
c98b05e 4804

Possibly there's some kind of phase change in the compiler output if the function gets too big (eg. the optimizer times out and gives up partway, or something). Here is a copy of the whole cfg, for reference: cfg-tmp.json, and a link to the generating code, in case you can spot some obvious usage error: https://github.com/willmmiles/WLED/blob/c2d2000f9b91fcb02fd37b7f255911ff4fe5b5ac/wled00/cfg.cpp#L640

@bblanchon
Copy link
Owner

Here are the variables scoped to serializeConfig():

JsonObject root;
JsonArray rev;
JsonObject id;
JsonObject nw JsonArray nw_ins;
JsonObject nw_ins_0;
JsonArray nw_ins_0_ip;
JsonArray nw_ins_0_gw;
JsonArray nw_ins_0_sn;
JsonObject ap;
JsonArray ap_ip;
JsonObject wifi;
JsonObject ethernet;
JsonObject hw;
JsonObject hw_led;
JsonArray hw_led_ins;
JsonArray hw_com;
const ColorOrderMap &com;
JsonObject hw_btn;
JsonArray hw_btn_ins;
JsonObject hw_ir;
JsonObject hw_relay;
JsonObject hw_if;
JsonArray hw_if_i2c;
JsonArray hw_if_spi;
JsonObject light;
JsonObject light_gc;
JsonObject light_tr;
JsonObject light_nl;
JsonObject def;
JsonObject interfaces;
JsonObject if_sync;
JsonObject if_sync_recv;
JsonObject if_sync_send;
JsonObject if_nodes;
JsonObject if_live;
JsonObject if_live_dmx;
JsonObject if_va;
JsonArray if_va_macros;
JsonObject if_mqtt;
JsonObject if_mqtt_topics;
JsonObject if_hue;
JsonObject if_hue_recv;
JsonArray if_hue_ip;
JsonObject if_ntp;
JsonObject ol;
JsonObject timers;
JsonObject cntdwn;
JsonArray goal;
JsonArray timers_ins;
JsonObject ota;
JsonObject dmx;
JsonArray dmx_fixmap;
JsonObject usermods_settings;
File f;

16*sizeof(JsonArray)+37*sizeof(JsonObject)+sizeof(File) is 468 in both ArduinoJson 6 and 7.

Unfortunately, this doesn't explain the increase between v6 and v7, but you could easily save these 468 by splitting this function or calling serializeJson() out of it.

@willmmiles
Copy link
Author

The disassembly analysis suggested that the compiler is also storing all the temporaries at unique stack addresses, too -- every object looked up with [], and (with v7) a FlashString for every F(). I don't know why it's refusing to re-use the same stack for each one. As a quick test, maybe you could try adding more data members in your test program to see if it increases the stack usage for each one? If you don't see any increase in utilization, that narrows it down to something unique to my build configuration, so I know where to start looking next.

I can definitely work around the issue by breaking up the large configuration to smaller functions (assuming the compiler doesn't undo it all by inlining them!). I'm not quite ready to give up yet, though - I'd hate to be the guy debugging some project that suddenly got unstable when I added one more entry to the configuration file, or to have to document "you must break up large serializations in to small functions because otherwise it might crash on some systems".

@willmmiles
Copy link
Author

willmmiles commented Feb 2, 2024

You were on the right track with looking at inlining. It looks like gcc (at least the version in this toolchain) isn't able to share space in the stack frame for inlined functions -- every inlined copy has to be allocated unique stack space for their own arguments. This is the cause of the additional stack usage.

As an experiment, I tried globally disabling FORCE_INLINE in ArduinoJson, in favor of letting the compiler decide when to inline. The results was a large reduction in stack usage (down to 1572 bytes!!), a 14528 byte reduction in program size, and no noticable performance change for my use case. If it's helpful, the compiled serializeConfig function became basically a list of detail::VariantRefBase<>::to<>, JsonArray::add<>, and detail::VariantRefBase<>::set<> calls -- these might be the right places to not force inlining.

@bblanchon
Copy link
Owner

That's the opposite of my experience with FORCE_INLINE: it often helps me reduce the code size and stack usage.
Can you pinpoint the ones that cause this large increase?

@willmmiles
Copy link
Author

willmmiles commented Feb 2, 2024

Sure. I won't get to it today, but tomorrow I'll see about scripting up a systematic search. I'm pretty sure we can use -Wframe-larger-than to get gcc to print out the stack frame size without even running the code.

@bblanchon
Copy link
Owner

I did some scripting to find which FORCE_INLINEs were actually useful.

I compiled in the following table the impact of each on the size of ArduinoJson's examples: JsonParserExample, JsonGeneratorExample, MsgPackExample, StringExample, and ProgmemExample.
I used AVR as the target CPU because it's where code size is usually an issue, and because it compile quickly.

File:line Parser Generator MsgPack String Progmem
Array/ElementProxy.hpp:31 0 0 0 0 0
Array/ElementProxy.hpp:37 0 0 0 0 0
Array/ElementProxy.hpp:43 0 0 0 0 0
Array/ElementProxy.hpp:47 -40 0 -66 0 0
Array/ElementProxy.hpp:53 0 0 0 0 0
Array/JsonArray.hpp:23 0 0 0 0 0
Array/JsonArray.hpp:26 0 0 0 0 0
Array/JsonArray.hpp:66 0 -72 0 0 0
Array/JsonArray.hpp:73 0 0 0 0 0
Array/JsonArray.hpp:79 0 0 0 0 0
Array/JsonArray.hpp:87 0 0 0 0 0
Array/JsonArray.hpp:93 0 0 0 0 0
Array/JsonArray.hpp:108 0 0 0 0 0
Array/JsonArray.hpp:114 0 0 0 0 0
Array/JsonArray.hpp:126 0 0 0 0 0
Array/JsonArray.hpp:136 0 0 0 0 0
Array/JsonArray.hpp:142 0 0 0 0 0
Array/JsonArray.hpp:148 0 0 0 0 0
Array/JsonArray.hpp:154 0 0 0 0 0
Array/JsonArrayConst.hpp:26 0 0 0 0 0
Array/JsonArrayConst.hpp:34 0 0 0 0 0
Array/JsonArrayConst.hpp:39 0 0 0 0 0
Array/JsonArrayConst.hpp:42 0 0 0 0 0
Array/JsonArrayConst.hpp:48 0 0 0 0 0
Array/JsonArrayConst.hpp:59 0 0 0 0 0
Array/JsonArrayConst.hpp:65 0 0 0 0 0
Array/JsonArrayConst.hpp:71 0 0 0 0 0
Array/JsonArrayConst.hpp:77 0 0 0 0 0
Document/JsonDocument.hpp:173 0 0 0 0 0
Document/JsonDocument.hpp:183 0 0 0 0 0
Document/JsonDocument.hpp:193 0 0 0 0 0
Document/JsonDocument.hpp:203 0 0 0 0 0
Document/JsonDocument.hpp:212 0 0 0 0 0
Document/JsonDocument.hpp:218 0 0 0 0 0
Document/JsonDocument.hpp:243 0 0 0 0 0
Document/JsonDocument.hpp:250 0 0 0 0 0
Document/JsonDocument.hpp:256 0 0 0 0 0
Document/JsonDocument.hpp:263 0 0 0 0 0
Document/JsonDocument.hpp:272 0 0 0 0 0
Document/JsonDocument.hpp:279 0 0 0 0 0
Document/JsonDocument.hpp:283 0 0 0 0 0
Json/Latch.hpp:28 -64 0 0 -70 -66
Object/JsonObject.hpp:23 0 0 0 0 0
Object/JsonObject.hpp:26 0 0 0 0 0
Object/JsonObject.hpp:46 0 0 0 0 0
Object/JsonObject.hpp:52 0 0 0 0 0
Object/JsonObject.hpp:58 0 0 0 0 0
Object/JsonObject.hpp:64 0 0 0 0 0
Object/JsonObject.hpp:70 0 0 0 0 0
Object/JsonObject.hpp:78 0 0 0 0 0
Object/JsonObject.hpp:90 0 0 0 0 0
Object/JsonObject.hpp:106 0 0 0 0 0
Object/JsonObject.hpp:116 0 0 0 0 0
Object/JsonObject.hpp:125 0 0 0 0 0
Object/JsonObject.hpp:132 0 0 0 0 0
Object/JsonObject.hpp:140 0 0 0 0 0
Object/JsonObject.hpp:148 0 0 0 0 0
Object/JsonObject.hpp:158 0 0 0 0 0
Object/JsonObjectConst.hpp:35 0 0 0 0 0
Object/JsonObjectConst.hpp:41 0 0 0 0 0
Object/JsonObjectConst.hpp:47 0 0 0 0 0
Object/JsonObjectConst.hpp:53 0 0 0 0 0
Object/JsonObjectConst.hpp:59 0 0 0 0 0
Object/JsonObjectConst.hpp:67 0 0 0 0 0
Object/JsonObjectConst.hpp:74 0 0 0 0 0
Object/JsonObjectConst.hpp:82 0 0 0 0 0
Object/JsonObjectConst.hpp:90 0 0 0 0 0
Object/JsonObjectConst.hpp:101 0 0 0 0 0
Object/MemberProxy.hpp:20 0 0 0 0 0
Object/MemberProxy.hpp:26 0 0 0 0 0
Object/MemberProxy.hpp:32 0 0 0 -16 -96
Object/MemberProxy.hpp:38 0 0 0 0 -16
Object/MemberProxy.hpp:44 0 0 0 0 0
Object/MemberProxy.hpp:48 34 0 -128 90 0
Object/MemberProxy.hpp:54 0 -48 0 92 -40
Strings/Adapters/RamString.:30 -2 -4 -12 0 -2
Variant/JsonVariant.hpp:26 0 0 0 0 0
Variant/JsonVariant.hpp:30 0 0 0 0 0
Variant/JsonVariant.hpp:34 0 0 0 0 0
Variant/JsonVariantConst.hpp:41 0 0 0 0 0
Variant/JsonVariantConst.hpp:46 0 0 0 0 0
Variant/JsonVariantConst.hpp:52 0 0 0 0 0
Variant/JsonVariantConst.hpp:65 0 0 0 0 0
Variant/JsonVariantConst.hpp:75 0 0 0 0 0
Variant/JsonVariantConst.hpp:83 0 0 0 0 0
Variant/JsonVariantConst.hpp:89 0 0 0 0 0
Variant/JsonVariantConst.hpp:97 0 0 0 0 0
Variant/JsonVariantConst.hpp:108 0 0 0 0 0
Variant/JsonVariantConst.hpp:119 0 0 0 0 0
Variant/JsonVariantConst.hpp:129 0 0 0 0 0
Variant/VariantAttorney.hpp:19 0 0 0 0 0
Variant/VariantAttorney.hpp:25 34 0 -120 90 0
Variant/VariantAttorney.hpp:31 0 -48 0 92 -40
Variant/VariantRefBase.hpp:31 0 0 0 0 0
Variant/VariantRefBase.hpp:37 0 0 0 0 0
Variant/VariantRefBase.hpp:42 0 0 0 0 0
Variant/VariantRefBase.hpp:49 -26 0 -44 104 0
Variant/VariantRefBase.hpp:58 0 0 0 0 0
Variant/VariantRefBase.hpp:63 -26 0 -44 0 0
Variant/VariantRefBase.hpp:87 0 0 0 0 0
Variant/VariantRefBase.hpp:94 0 0 0 0 0
Variant/VariantRefBase.hpp:103 0 -72 0 -16 -112
Variant/VariantRefBase.hpp:108 0 0 0 0 -16
Variant/VariantRefBase.hpp:112 0 0 0 0 0
Variant/VariantRefBase.hpp:118 0 0 0 0 0
Variant/VariantRefBase.hpp:139 0 0 0 0 0
Variant/VariantRefBase.hpp:146 0 0 0 0 0
Variant/VariantRefBase.hpp:152 0 0 0 0 0
Variant/VariantRefBase.hpp:159 0 0 0 0 0
Variant/VariantRefBase.hpp:168 0 0 0 0 0
Variant/VariantRefBase.hpp:176 0 0 0 0 0
Variant/VariantRefBase.hpp:181 0 0 0 0 0
Variant/VariantRefBase.hpp:187 0 0 0 0 0
Variant/VariantRefBase.hpp:193 0 0 0 0 0
Variant/VariantRefBase.hpp:200 0 0 0 0 0
Variant/VariantRefBase.hpp:257 0 0 0 0 0
Variant/VariantRefBase.hpp:261 18 0 -160 90 0
Variant/VariantRefBase.hpp:265 0 -48 0 92 -40
Variant/VariantRefBase.hpp:269 0 0 0 0 0
Variant/VariantRefBase.hpp:271 -24 0 -202 74 0
Variant/VariantRefBase.hpp:275 0 0 0 86 -36

In this table, a negative number means the size decreases (good) and a positive number means that the size increases (bad).

I already remove the 103 FORCE_INLINEs that didn't affect the code size, so make sure you pull the lastest version from the v7 branch.

@bblanchon
Copy link
Owner

bblanchon commented Feb 4, 2024

I asked the ArduinoJson Assistant to write the program to generate the same content as the cfg-tmp.json you uploaded earlier.

Then, I ran my script again to see the effect of each FORCE_INLINE on the size of the code.
This time, I compiled for ESP8266, not AVR.

File:line WLED Parser Generator MsgPack String Progmem
Array/ElementProxy.hpp:25 0 0 0 0 0 0
Array/ElementProxy.hpp:47 0 0 0 -16 0 0
Array/JsonArray.hpp:65 448 0 0 0 0 0
Array/JsonArray.hpp:72 0 0 0 0 0 0
Array/JsonArray.hpp:78 0 0 0 0 0 0
Array/JsonArray.hpp:86 0 0 0 0 0 0
Array/JsonArray.hpp:92 0 0 0 0 0 0
Array/JsonArray.hpp:107 0 0 0 0 0 0
Array/JsonArray.hpp:113 0 0 0 0 0 0
Array/JsonArray.hpp:125 0 0 0 0 0 0
Array/JsonArray.hpp:135 0 0 0 0 0 0
Array/JsonArray.hpp:141 0 0 0 0 0 0
Array/JsonArray.hpp:147 0 0 0 0 0 0
Array/JsonArray.hpp:153 0 0 0 0 0 0
Document/JsonDocument.hpp:182 0 0 0 0 0 0
Document/JsonDocument.hpp:192 0 0 0 0 0 0
Document/JsonDocument.hpp:202 0 0 0 0 0 0
Document/JsonDocument.hpp:211 0 0 0 0 0 0
Document/JsonDocument.hpp:217 0 0 0 0 0 0
Document/JsonDocument.hpp:242 0 0 0 0 0 0
Document/JsonDocument.hpp:249 0 0 0 0 0 0
Document/JsonDocument.hpp:255 0 0 0 0 0 0
Document/JsonDocument.hpp:262 0 0 0 0 0 0
Document/JsonDocument.hpp:271 0 0 0 0 0 0
Document/JsonDocument.hpp:278 0 0 0 0 0 0
Document/JsonDocument.hpp:282 0 0 0 0 0 0
Json/Latch.hpp:28 0 -32 0 0 -32 -32
Object/JsonObject.hpp:45 0 0 0 0 0 0
Object/JsonObject.hpp:51 0 0 0 0 0 0
Object/JsonObject.hpp:57 0 0 0 0 0 0
Object/JsonObject.hpp:63 0 0 0 0 0 0
Object/JsonObject.hpp:69 0 0 0 0 0 0
Object/JsonObject.hpp:77 0 0 0 0 0 0
Object/JsonObject.hpp:89 0 0 0 0 0 0
Object/JsonObject.hpp:105 0 0 0 0 0 0
Object/JsonObject.hpp:115 3520 0 0 0 0 0
Object/JsonObject.hpp:124 0 0 0 0 0 0
Object/JsonObject.hpp:131 0 0 0 0 0 0
Object/JsonObject.hpp:139 0 0 0 0 0 0
Object/JsonObject.hpp:147 0 0 0 0 0 0
Object/JsonObject.hpp:157 0 0 0 0 0 0
Object/MemberProxy.hpp:32 5952 0 0 0 0 -16
Object/MemberProxy.hpp:38 1824 0 0 0 0 0
Object/MemberProxy.hpp:48 0 -32 0 -16 -16 0
Object/MemberProxy.hpp:54 -384 0 -16 0 32 -16
Strings/Adapters/RamString.hpp:30 -64 0 -48 -64 0 0
Variant/VariantAttorney.hpp:25 0 -16 0 0 -16 0
Variant/VariantAttorney.hpp:31 -384 0 -16 0 32 -16
Variant/VariantRefBase.hpp:49 0 -32 0 -48 0 0
Variant/VariantRefBase.hpp:62 0 -32 0 -48 -48 0
Variant/VariantRefBase.hpp:86 0 0 0 0 0 0
Variant/VariantRefBase.hpp:93 0 0 0 0 0 0
Variant/VariantRefBase.hpp:102 6032 0 -32 0 0 -16
Variant/VariantRefBase.hpp:107 1824 0 0 0 0 0
Variant/VariantRefBase.hpp:111 0 0 0 0 0 0
Variant/VariantRefBase.hpp:117 0 0 0 0 0 0
Variant/VariantRefBase.hpp:138 0 0 0 0 0 0
Variant/VariantRefBase.hpp:145 0 0 0 0 0 0
Variant/VariantRefBase.hpp:151 0 0 0 0 0 0
Variant/VariantRefBase.hpp:158 0 0 0 0 0 0
Variant/VariantRefBase.hpp:167 0 0 0 0 0 0
Variant/VariantRefBase.hpp:175 0 0 0 0 0 0
Variant/VariantRefBase.hpp:180 0 0 0 0 0 0
Variant/VariantRefBase.hpp:186 0 0 0 0 0 0
Variant/VariantRefBase.hpp:192 0 0 0 0 0 0
Variant/VariantRefBase.hpp:199 0 0 0 0 0 0
Variant/VariantRefBase.hpp:256 0 0 0 0 0 0
Variant/VariantRefBase.hpp:260 0 -16 0 -16 -16 0
Variant/VariantRefBase.hpp:264 -384 0 -16 0 32 -16
Variant/VariantRefBase.hpp:268 0 0 0 0 0 0
Variant/VariantRefBase.hpp:270 0 -16 0 -16 -32 0
Variant/VariantRefBase.hpp:274 3344 0 0 0 0 -80

We can see that some of them have a very negative impact on code size.

@willmmiles
Copy link
Author

We can see that some of them have a very negative impact on code size.

Yes, I'm thinking there's a matter of scale here -- if we inline a function a few times, it doesn't matter much, but if we have to inline it hundreds of times, the extra code can really add up. To make things even more complex, large projects like WLED use ArduinoJson in several different contexts (and more relevantly, translation units). Being able to share common code can be a huge savings.

Is your script knocking out FORCE_INLINE or adding it only on the one line? I'm running a data collection now over the whole WLED project, selectively removing FORCE_INLINE in each location and collecting program size and stack usage metrics. I might try going the other way (selectively adding it) for comparison.

@bblanchon
Copy link
Owner

My script removes one FORCE_INLINE at a time, so it compiles with n-1 FORCE_INLINEs.
You might indeed get a different result if you compile with one FORCE_INLINE at a time.

Before making your tests, do you want me to push a new version without the seven FORCE_INLINEs that increase the code size?

@willmmiles
Copy link
Author

Data collection is done, I'm processing the results now. I'll post them up shortly. We can re-run with the changes after I've done the initial analysis.

@willmmiles
Copy link
Author

Apologies for the monster table. This is the initial result set:

  • The "flash" column shows the total ROM size difference with respect to the baseline (a3454e3)
  • The other columns show changes to the stack size for each of the named functions. Positive numbers increase stack size, negative numbers decrease it. N/A is shown if the function has a stack size less than 128 bytes.
  • The "no_inline" row globally disables FORCE_INLINE and lets the compiler decide.
File:Line Flash wled00/hue.cpp:onHueData(void*, AsyncClient*, void*, size_t) wled00/FX_fcn.cpp:WS2812FX::loadCustomPalettes() wled00/json.cpp:deserializeSegment(ArduinoJson::V702PA2::JsonObject, byte, byte) wled00/cfg.cpp:deserializeConfig(ArduinoJson::V702PA2::JsonObject, bool) wled00/json.cpp:deserializeState(ArduinoJson::V702PA2::JsonObject, byte, byte) wled00/json.cpp:serializePalettes(ArduinoJson::V702PA2::JsonObject, int) wled00/cfg.cpp:_Z15serializeConfigv$part$0() wled00/json.cpp:serializeModeNames(ArduinoJson::V702PA2::JsonArray) wled00/json.cpp:serializeModeData(ArduinoJson::V702PA2::JsonArray) wled00/cfg.cpp:_Z18serializeConfigSecv$part$0() wled00/json.cpp:serializeInfo(ArduinoJson::V702PA2::JsonObject) wled00/json.cpp:serveJson(AsyncWebServerRequest*) wled00/set.cpp:handleSettingsSet(AsyncWebServerRequest*, byte) wled00/ws.cpp:wsEvent(AsyncWebSocket*, AsyncWebSocketClient*, AwsEventType, void*, uint8_t*, size_t) wled00/ir.cpp:decodeIRJson(uint32_t) wled00/json.cpp:serializeSegment(ArduinoJson::V702PA2::JsonObject&, Segment&, byte, bool, bool) wled00/json.cpp:serializeState(ArduinoJson::V702PA2::JsonObject, bool, bool, bool, bool) wled00/remote.cpp:remoteJson(int)
Array/ElementProxy.hpp:25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Array/ElementProxy.hpp:47 272 16 48 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Array/JsonArray.hpp:107 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Array/JsonArray.hpp:113 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Array/JsonArray.hpp:125 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Array/JsonArray.hpp:135 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Array/JsonArray.hpp:141 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Array/JsonArray.hpp:147 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Array/JsonArray.hpp:153 -32 0 16 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Array/JsonArray.hpp:65 -1680 0 0 0 0 0 0 -304 0 0 0 -48 0 0 0 0 0 0 0
Array/JsonArray.hpp:72 -608 0 0 0 0 0 -224 0 -16 -32 0 0 0 0 0 0 0 0 0
Array/JsonArray.hpp:78 -32 0 0 0 -32 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Array/JsonArray.hpp:86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Array/JsonArray.hpp:92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Document/JsonDocument.hpp:182 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Document/JsonDocument.hpp:192 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Document/JsonDocument.hpp:202 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Document/JsonDocument.hpp:211 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Document/JsonDocument.hpp:217 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Document/JsonDocument.hpp:242 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Document/JsonDocument.hpp:249 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Document/JsonDocument.hpp:255 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Document/JsonDocument.hpp:262 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Document/JsonDocument.hpp:271 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Document/JsonDocument.hpp:278 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Document/JsonDocument.hpp:282 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Json/Latch.hpp:28 176 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/JsonObject.hpp:105 -112 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/JsonObject.hpp:115 -2672 0 0 0 0 16 16 -1312 0 0 0 0 0 0 0 0 0 16 0
Object/JsonObject.hpp:124 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/JsonObject.hpp:131 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/JsonObject.hpp:139 64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/JsonObject.hpp:147 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/JsonObject.hpp:157 -16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/JsonObject.hpp:45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/JsonObject.hpp:51 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/JsonObject.hpp:57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/JsonObject.hpp:63 -64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/JsonObject.hpp:69 -16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/JsonObject.hpp:77 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/JsonObject.hpp:89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Object/MemberProxy.hpp:32 -13072 0 0 0 0 0 -16 -3600 0 0 -80 -720 0 -112 0 0 -336 N/A 0
Object/MemberProxy.hpp:38 -1136 0 0 0 0 0 0 -208 0 0 -32 -176 0 0 0 0 -16 0 0
Object/MemberProxy.hpp:48 -896 0 0 -32 -304 -80 0 0 0 0 0 0 0 -64 0 16 0 0 16
Object/MemberProxy.hpp:54 -4592 0 0 16 0 0 -16 -1344 0 0 -80 -560 -32 -128 0 0 -80 -144 0
Strings/Adapters/RamString.hpp:30 -32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantAttorney.hpp:25 -576 0 48 -32 -304 -80 0 0 0 0 0 0 0 -64 0 16 0 0 16
Variant/VariantAttorney.hpp:31 -4640 0 0 16 0 0 -16 -1344 0 0 -80 -560 -32 -128 0 0 -80 -144 0
Variant/VariantRefBase.hpp:102 -13328 0 0 0 0 0 -16 -3824 0 0 -80 -768 0 -144 0 0 -336 N/A 0
Variant/VariantRefBase.hpp:107 -1472 0 0 0 0 0 -224 -208 0 -16 -32 -176 0 0 0 0 -16 0 0
Variant/VariantRefBase.hpp:111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:117 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:138 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:145 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:151 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:158 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:167 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:175 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:180 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:186 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:192 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:199 432 0 0 0 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:256 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:260 -576 16 48 -32 -304 -80 0 0 0 0 0 0 0 -64 0 16 0 0 16
Variant/VariantRefBase.hpp:264 -4624 0 0 16 0 0 -16 -1344 0 0 -80 -560 -32 -128 0 0 -80 -144 0
Variant/VariantRefBase.hpp:268 64 0 0 -16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:270 352 80 128 160 32 48 0 0 0 0 0 0 0 -16 16 32 0 0 16
Variant/VariantRefBase.hpp:274 -9424 0 0 16 0 0 -16 -2560 0 0 -32 -432 0 -80 0 0 -80 -144 0
Variant/VariantRefBase.hpp:49 -1456 16 32 0 -288 -48 0 0 0 0 0 0 0 -192 0 N/A 0 0 N/A
Variant/VariantRefBase.hpp:62 -432 16 0 16 -256 -48 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:86 64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Variant/VariantRefBase.hpp:93 -128 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
no_inline -16992 0 64 0 -224 -16 -240 -3952 -16 -32 -112 -928 -32 -400 0 N/A N/A N/A N/A

@willmmiles
Copy link
Author

@bblanchon
Copy link
Owner

Thank you for this fantastic work! ❤️

I thought you wanted to compile with only one FORCE_INLINE at a time, but after looking at the source code, it seems you used the same technique as me: remove one FORCE_INLINE at a time.
Note that I subtracted the size from the baseline, so my results are reversed compared to yours.

I initially thought we could beat the compiler by adding judiciously placed FORCE_INLINEs, but I'm starting to believe this is a risky business.
On the one hand, VariantRefBase.hpp:270 reduces the code by 352 bytes and saves 160 bytes for deserializeSegment().
But on the other hand, VariantRefBase.hpp:102 increases the code size by 13KB and loses 3824 bytes of stack in serializeConfig().
In this game, there seems to be much more to lose than gain.

After seeing your results, I think I'd better remove all the FORCE_INLINE, even if it means a 2% increase in the size of the examples.
Sure, I'll be unable to optimize the size of my benchmarks, but I'll avoid making unpredictable mistakes in everyone else's projects.

Let me know what you think.

@willmmiles
Copy link
Author

I thought you wanted to compile with only one FORCE_INLINE at a time, but after looking at the source code, it seems you used the same technique as me: remove one FORCE_INLINE at a time.

Ah, sorry for the confusion - I wanted to try the knock-out approach, like you did, first; I was thinking about trying "start with none, add one at a time" as a follow up. Given the test build framework, I had a half-baked idea to run a simulated annealing type search. Although I fear a deeper search might lend itself to overfitting for the test sample.

It's definitely interesting how removing even one FORCE_INLINE on the "critical path" can make such a difference to the output on a large project.

I think there may not be a one-size-fits-all solution. It could be the right answer is some level of configurability, as much as I hate the complexity of tunables. For example, in WLED, we have many translation units doing JSON, so there's a big size advantage to sharing code. A smaller project might have just one translation unit, where sharable functions just waste space. Alas, I don't think link-time optimization is available for many of these embedded platforms. :(

Bottom line: I think letting the compiler make the call is probably the best for unknown arbitrary use cases. Given that you've already got the markup in the code, though, I think having a ARDUINOJSON_SMALL_PROJECT macro option or something similar to force micro-optimization might still be handy many use cases.

@bblanchon
Copy link
Owner

I finally decided to keep twelve FORCE_INLINEs that have a positive impact on code size and no negative impact on stack consumption.

I force-pushed this commit because I made a mistake in the previous one, so you may need to do a hard reset.

Thank you again for your indispensable contribution!

@bblanchon
Copy link
Owner

The fix is available in ArduinoJson 7.0.3.

@willmmiles
Copy link
Author

Thank you! It was a pleasure to work with you. :)

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug v7 ArduinoJson 7
Projects
None yet
Development

No branches or pull requests

2 participants