Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug on WiFi code, related to power usage #5825

Closed
4 of 6 tasks
JonasGMorsch opened this issue Feb 26, 2019 · 23 comments
Closed
4 of 6 tasks

Possible bug on WiFi code, related to power usage #5825

JonasGMorsch opened this issue Feb 26, 2019 · 23 comments

Comments

@JonasGMorsch
Copy link

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: [ESP-12E|ESP-12F]
  • Core Version: [2/26/19]
  • Development Env: [Arduino IDE]
  • Operating System: [Windows 10]

Settings in IDE

  • Module: [Nodemcu v2]
  • Flash Size: [4MB]
  • lwip Variant: [v2 Higher Bandwidth]
  • Flash Frequency: [40Mhz]
  • CPU Frequency: [80Mhz]
  • Upload Using: [SERIAL]
  • Upload Speed: [2000000]

Problem Description

Hi Everyone,
I guess that i found a bug on the wifi related to power usage.
After few hours of testing multiple NodeMCU's i noticed that every code that not uses wifi has around 50mA/h power usage.
But to my surprise, one of my MQTT codes with WIFI_NONE_SLEEP hovers between 20 to 40mA/h, while the others had the same 50mA/h behavior.
The only difference was a delay(1); in the main loop, and yes, it was it.
First i thought it had something to do with the delay lowering the loop count, but then i came up with this test code:

MCVE Sketch

#include <ESP8266WiFi.h>

void setup()
{
  WiFi.setSleepMode(WIFI_NONE_SLEEP);
  WiFi.begin("network-name", "pass-to-network");
  Serial.begin(2000000, SERIAL_8N1, SERIAL_TX_ONLY); //change if needed
  while (!Serial);
}

void loop()
{
  static uint32_t t;
  t++;
  static uint32_t timer_1 = millis();
  if ((millis() - timer_1) > 1000)
  {
    Serial.print("LOOPS PER SECOND:");
    Serial.println(t);
    t = 0;
    timer_1 = millis();
  }

  for (uint64_t i = 0; i < 5e4; i++); //simulate a big task (~62 ms)
  delay(1);
}

With or without delay(1); the loop count is around 16/s, which is pretty low, and yet the same problem appears, 20-40mA/h with delay, 50mA/h without it.
delayMicroseconds and yield make no difference in this case.

If someone has a good ampere meter (mine isn't) and some free time i would appreciate some help.
And also, if i'm doing something wrong let me know.

Debug Messages

SDK:2.2.1(cfd48f3)/Core:unix-2.6.0-dev=-194000/lwIP:STABLE-2_1_2_RELEASE/glue:1.1-2-ga501b57/BearSSL:6778687
wifi evt: 2
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 6
cnt 

connected with *******, channel 1
dhcp client start...
wifi evt: 0
ip:192.168.1.142,mask:255.255.255.0,gw:192.168.1.10
wifi evt: 3
LOOPS PER SECOND:17
LOOPS PER SECOND:16
LOOPS PER SECOND:16
LOOPS PER SECOND:16
...
@ascillato
Copy link
Contributor

Hi,

I'm also an user of this core and I also see this difference in power, using or not the delay(1).

I don't think this is a problem because when you use delay the cpu sleeps (not the wifi) so the amount of power decreases a lot.

If you use the sleep in wifi you will see a decrease on power usage too, so, I use both approaches to lower a lot the power usage. Wifi modem sleep and delay(x).

If you want, you can check the code that we use in Tasmota software (https://github.com/arendst/Sonoff-Tasmota) with an approach of dynamic sleep for cpu (Andre Thomas' idea - @andrethomas)

@JonasGMorsch
Copy link
Author

Hi, ascillato

Thats the part i cant get my head around, if the CPU sleeps for 1 ms and works for 62 ms every cycle (on the example sketch) we should see a proportional decrease of 1/63 in power usage every power cycle, right?

I let it measuring through the last night and got 28 mA/h, basically half the power for 16 ms of sleep and 984 ms of work every second, this is not adding up.

I will definitely check the Tasmota codes, and also test CPU power usage with WiFi completely off, using #2111 recommendations, just to be sure.

@andrethomas
Copy link

Just don't confuse CPU load with the indicated loadavg which is a representation of how much time the main loop spends in callbacks measured against the target loop setting (dynamic sleep value)

@andrethomas
Copy link

Here's a crude example of how a dynamic target loop delay may be used to emulate timed function callbacks

const uint32_t target_loop_delay = 50;

uint32_t call_next_500ms = 0;
uint32_t call_next_1000ms = 0;

void setup() {
  Serial.begin(115200);
  call_next_500ms = millis()+500;
  call_next_1000ms = millis()+1000;
}

void loop() {
  uint32_t my_sleep = millis();
  uint32_t os_timer = millis();

  // Do all the main loop stuff in here
  if (os_timer > call_next_500ms) {
    call_next_500ms = os_timer+500;
    Serial.println("500ms functions go here");
  }

  if (os_timer > call_next_1000ms) {
    call_next_1000ms = os_timer+1000;
    Serial.println("1000ms functions go here");
  }

  // End of everything normally done in main loop

  uint32_t my_activity = millis() - my_sleep;
  if (my_activity < target_loop_delay) {
    delay(target_loop_delay - my_activity);
  } else {
    delay(my_activity /2); // Force a yield period of at least 1/3 of the total loop cycle time if it was more than target_loop_delay
  }
}

@devyte
Copy link
Collaborator

devyte commented Feb 27, 2019

@andrethomas please don't use that approach, it doesn't work correctly on millis() rollover.
For polled timing such as what you're trying to do please use the PolledTimeout class, provided in polledTimeout.h. It correctly implements what you're trying to do even on rollover, hides the calculation details, and has good usability for both single-shot and periodic cases. There are examples that showcase usage as well.

@devyte
Copy link
Collaborator

devyte commented Feb 27, 2019

@JonasGMorsch what happens if instead of delay(1) you

  • yield()
  • delay(0)
  • delay(2)
  • delay(3)
    ?

@andrethomas
Copy link

@devyte I did say "crude" ;)

@JonasGMorsch
Copy link
Author

@devyte

  • Any delay() bigger than 1 : 28 mA/h
  • delay(0)/yield() : 50 mA/h
  • delayMicroseconds(1000) : 50 mA/h
  • loop completely empty : 50 mA/h
  • loop with anything that doesn't have a delay() bigger than 1 : 50 mA/h

All results have virtually the same CPU work.
WiFi.setSleepMode(WIFI_NONE_SLEEP); in all tests

Looking inside: /cores/esp8266/core_esp8266_wiring.cpp

void delay(unsigned long ms) {
    if(ms) {
        os_timer_setfn(&delay_timer, (os_timer_func_t*) &delay_end, 0);
        os_timer_arm(&delay_timer, ms, ONCE);
    } else {
        esp_schedule();
    }
    esp_yield();
    if(ms) {
        os_timer_disarm(&delay_timer);
    }
}

Looks like it has something to do with this os_timer functions, i will try to comment out one each time and see what happens

@JonasGMorsch
Copy link
Author

@ascillato, @devyte
Interesting...
So:

#include <ESP8266WiFi.h>

void setup()
{
  WiFi.forceSleepBegin();
  delay(1);
  //delayMicroseconds(2000);  //doesn't work
  //delay(0);  //nope
  //yield();     //nothing
}

void loop() 
{ 
  for (uint64_t i = 0; i < 5e4; i++);  //simulate a big task (~62 ms)
}

And the current drops to 10mA/h or lower!, but if i comment out the delay(1), you guessed it : 50 mA/h
https://www.hackster.io/rayburne/esp8266-turn-off-wifi-reduce-current-big-time-1df8ae
and
https://www.esp8266.com/viewtopic.php?t=7519
noticed this way before me, but the problem has noting to do with waiting for the RF shutdown, or delayMicroseconds should do the trick.

I think the WiFi library and os_timer are related some way, some WiFi functions only complete after some os_timer interactions.

If is something inside de SDK itself then i really gonna need help, but even if we cant do anything there, just to put the os_timer on the needed places will save a lot of power in a lot of projects.

@andrethomas
Copy link

The other interesting thing to note is that without adding delay() somewhere not only does the current demand increase but also the heat generated by the ESP82xx

You do not easily notice this on a shielded nodemcu that has good thermal dissipation essentially using the pc board as a heat sink but when you try it on a wemos d1 mini v1 which does not have enough thermal conductivity to get rid of the heat generated by the ESP82xx it usually pretty much gets too hot and goes into some or the other mode where it starts dropping wifi connections which is eventually followed by a reset.

@JonasGMorsch
Copy link
Author

@andrethomas, you are right!

I didn't notice it before, the diference in temperature is so great that after 5 minutes running i can actually compare just with my finger, especially from 10 to 50 mA

I'm now trying to figure how much delay(1)'s per millisecond i need to keep this under control, and hope that will give-me some hint of where is the problem, since mess with the original delay basically breaks everything.

@andrethomas
Copy link

@JonasGMorsch I've been called worse ;)

I don't know the origin of the problem - What I can tell you run fairly complex firmware such as Sonoff-Tasmota you need to give at least a few milliseconds per main loop to keep the wifi etc alive on boards which have bad thermal dissipation. As a default, we're currently using 50ms as a main loop target delay but this is largely informed by the fastest callback we have being 50ms apart from each other rather than a scientific or mathematical equation.

We don't use the PolledTimeout class suggested by @devyte because we still actively support the 2.3.0 core as users get mixed results with different wifi configurations across all three cores we support (2.3.0, 2.4.2 and 2.5.0) and the method we use seems to work reliably enough but also that the mentioned class is only introduced after 2.3.0 so its practical and easier to support in its current form - Some cores seem to work better for some wifi configurations such as mesh etc so that adds a bit of flexibility for the user base.

It is somewhat alarming when you initially stumble upon the heat dissipation requirements of the ESP82xx but its clear that there is a relation between the amount of actual processing compared to the amount of time given to yield (delay(x)) which is necessary to keep the Arduino Core and perhaps the Tensilica processor happy from a wifi connectivity perspective. The source code for the underlying SDK is not publically available so a lot of guesswork is done and we cannot change the fundamental behaviour of the SDK so who knows what it's doing while we yield (or delay)

I don't expect this to be a linear relationship because that would imply that the Tensilica processor would require 0mA to not do anything at all which we know is only possible to a degree with deep sleep using a timer or external interrupt to wake the processor up from sleep.

My best guess is that if you do not use main loop delays the main loop will simply run as fast as it can and effectively cause load on the Tensilica processor resulting in an increased demand for current and resulting in an increased amount of heat which needs to be dissipated.

This is most likely by design which is why the ESP82xx chips have a nice thermal pad underneath it which should be ground planed in such a way to ensure good heat dissipation.

Further observation suggests that the above-mentioned symptoms exist in all the cores I have tested (2.3.0, 2.4.2 and 2.5.0) but it does appear to be a little more pronounced on the 2.5.0 core release. My best guess is that its not because of a bug in the Arduino Core but rather just the additional code added (the 2.5.0 core based binaries are larger) that is causing additional CPU cycles taking away some of the time you'd normally spend in yield (or yield for a delay()'d period of time) so probably there needs to be some compensation for this but without definitive insight into the underlying SDK there's no way to know for sure.

I would say it is just as likely that it is not the Arduino Core or the SDK but rather just the way the Tensilica was designed to operate.

I hope my observations can help you albeit not very scientific due to a lack of access to information on the inner workings of the SDK itself... so for now we just work around it as much as possible.

@TD-er
Copy link
Contributor

TD-er commented Mar 1, 2019

@andrethomas Thanks for the explanation.
I made a test setup for ESPeasy in which I call a delay in our scheduler when the scheduler should return "nothing to do".
This is a bit similar to what Tasmota is doing in their (much simpler) scheduling.
The first initial tests I did are very promising.
About 50 mA drop in power consumption of a NodeMCU (maybe more, lots of sensors connected to that one) and on some test nodes I flashed, I could see less occurrences of extreme timings in our timingstats page.
Also the node mounted in my car already showed a drop in temperature of over a degreeC. It is in a confined enclosure with a number of sensors and as such it is about 10C warmer than the ambient temperature when the car is not driving. (exact temp drop is hard to estimate, since I have to compare it to a sensor in the garden)

So I can't hardly wait to do some more tests tomorrow :)

@TD-er
Copy link
Contributor

TD-er commented Mar 2, 2019

OK, so I've done a lot of tests and I still don't know what's happening exactly, but at least I have some better idea of the symptoms I am seeing.

For ESPeasy I added a number of settings to toggle and see what's happening (related PR):

  • Force WiFi B/G
  • Restart WiFi on lost conn.
  • Force WiFi no sleep
  • Periodical send Gratuitous ARP
  • CPU Eco mode

The last option, what I've labelled "Eco mode" is perhaps the most on-topic one of this issue.
In ESPeasy I have created a scheduler which does set tasks to run at some moment. These are stored in a list and every loop() run, it will check this list.
With this "Eco mode" turned on, it will call delay() until the next task is due. (with some upper limit, currently 5 msec, but also tested with 0, 1 and 2 msec)
This does give a significant drop in power consumption of the ESP. My test setup is on a power supply which outputs also the power consumption in Watt, but the actual power used by the ESP should be compensated for the power loss in diodes, DC/DC converter and the linear voltage regulator on the Wemos D1 mini.
It reports 0.45 Watt when running with Eco mode off and 0.08 Watt with Eco mode on (short peaks of 0.25 Watt, but more likely to be 0.45 Watt but too short for my power meter)
It is very clear when it is running in this 'Eco mode' and when not, so let's not look at the true numbers anymore ;)

So this is great and imagine the power savings when 100'000'000 of these devices are running in Eco mode...
But there's a great "but" in here.

As soon as the node is running in this so called "Eco mode", then it is missing packets.
Not sure if this is a hardware issue or a software one and if software, where it should be.
In ESPeasy we have this so called "ESPeasy p2p" protocol, which helps locating other nodes and allows to share recorded measurements among nodes and send commands to nodes.
The most prominent way to quickly see if packets are missing is by watching the number of reported nodes in the network and their "Age" (when it was seen last in minutes)

A node which is running in low-power mode according to the power meter:
image

The same node while using 0.45 Watt:
image

This is regardless of switching the WiFi radio to 'always on' (calling WiFi.setSleepMode(WIFI_NONE_SLEEP); right before the begin of WiFi) or sending Gratuitous ARP packets.
This last option (Gratuitous ARP) does seem to help improving stability, so I will leave it in there. Checking the option in ESPeasy will cause it to be sent at an interval (tried 5 sec and 60 sec), but it will also be sent when an attempt to make a connection to any host fails, regardless of that check mark.

Also good to notice is that switching on the 'Eco mode' does not yield a lower power consumption immediately. It may sometimes take even 10 minutes or more before it will be noticeable and it may be temporarily seem to be turned off if you look at the power consumption.

When running a ping from an external host to the ESP, the power consumption will remain at 0.45 Watt and thus the reported nodes in the ESPeasy network all have an "Age" of 0 or 1 minutes (discovery packets are sent every minute)

N.B. The delay spent in the scheduler for 'idle loops' is still like normal, regardless of the power used. So it is something in the core libraries (or SDK) which is changing something to have this effect.

To summarize:

  • In all the tests, the power consumption correlates perfectly with missing receiving packets.
  • Sending data from the ESP seems to work just fine and also sending the Gratuitous ARP packets does seem to help in getting answers from other hosts.
  • Also setting the WIFI_NONE_SLEEP does not prevent these issues and it does allow the power consumption to decrease. (running core 2.6.0, SDK2.2.1)
  • Sending ARP packets, even at an interval of 5 sec, does not prevent the ESP to go in 'eco mode'
  • Sending ping to an ESP does force it to remain in "high power" mode.

I hope this may help in finding the root cause of all these WiFi issues which lead to hardware watchdog resets and maybe also some resets due to "Exception".
And it would be really nice when the "eco mode" could be used in some way to save some power and still be able to react on packets sent unexpectedly (e.g. receiving a command)

@andrethomas
Copy link

@TD-er

Interesting... question - which LwIP variant are you using?

@TD-er
Copy link
Contributor

TD-er commented Mar 4, 2019

@andrethomas
This is what I am using now:

ESP82xx Core 2.6.0-dev, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.1.2 PUYA support

PIO_FRAMEWORK_ARDUINO_LWIP2_LOW_MEMORY_LOW_FLASH

It is on a node without Puya chip, so that shouldn't make the difference :)

@andrethomas
Copy link

I'm not sure if that's low memory or high memory - what MTU is used?

@d-a-v
Copy link
Collaborator

d-a-v commented Mar 5, 2019

It's low memory "no feature" (mss 536) (mtu is always 1460)

@andrethomas
Copy link

Ah, I meant MSS - Could this be the cause of some of the packets being dropped as observed by @TD-er ?

@d-a-v
Copy link
Collaborator

d-a-v commented Mar 5, 2019

I don't think so. Packets are not dropped with small MSS. Data are sliced into smaller packet/payload (less memory but more packets ->more "ack" latencies -> less bandwidth).

@TD-er
Copy link
Contributor

TD-er commented Mar 5, 2019

The data packets which are missed by my nodes are quite small. (about 41 bytes of payload)
They are nowhere near the size of MSS.

@Tech-TX
Copy link
Contributor

Tech-TX commented Feb 1, 2020

Nearly a year later and I can't exacftly duplicate Jonas' original numbers unless I comment out the WIFI_NONE_SLEEP line. When I do that, it goes into Automatic Modem Sleep, and the observed currents look like what he's describing (about 20mA average). It's possible that WIFI_NONE_SLEEP wasn't properly written in older versions of the SDK. The current git doesn't sleep with that WIFI_NONE_SLEEP option set. Here's what his code looks like to me, 50mA per major vertical division with WIFI_NONE_SLEEP commented out:
Automatic Modem Sleep
The little low spikes on the top trace are the delay(1), so you can see that the effect from delay() is not proportional to power, as the modem sleeps between TIM beacons once it sees the delay() at the right time. 1/62 is enough to drop the power by about half. Higher percentage of delay() will cut it to around 1/4th or so.

With the WIFI_NONE_SLEEP left in (his original sketch), I get 72mA continuous with spikes during transmissions.

@devyte
Copy link
Collaborator

devyte commented Feb 2, 2020

Closing based on previous comment. If you think the problem is still valid in latest git, please open a new issue and fill out the required info.

@devyte devyte closed this as completed Feb 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants