Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wifi.suspend() and node.sleep() #1231

Merged
merged 6 commits into from
Apr 4, 2017

Conversation

dnc40085
Copy link
Contributor

@dnc40085 dnc40085 commented Apr 10, 2016

This PR exposes the WiFi forced sleep API, which allows the WiFi radio to be temporarily disabled to reduce power consumption or to allow the application developer to run functions that would normally cause the WiFi stack to crash.

Here is the documentation for the new functions: wifi.suspend() and wifi.resume()

I was going to mention this in the issue (Please document wifi.sleep function in official API docs(#1115)) but it seems to have disappeared.

@dnc40085 dnc40085 changed the title Added wifi.suspend() function to wifi module Add wifi.suspend() function to wifi module Apr 10, 2016
@jmattsson
Copy link
Member

👍 for merge as soon as we get the next master drop out of the way imo.

@dnc40085
Copy link
Contributor Author

Would it be useful to add functions to allow this API to be used in source rather than just in Lua?

@dnc40085
Copy link
Contributor Author

dnc40085 commented May 28, 2016

As requested by mjmcginty I will be adding the function node.sleep() to this PR.

@marcelstoer
Copy link
Member

@dnc40085 once you do pls resolve the merge conflicts, too.

@dnc40085
Copy link
Contributor Author

dnc40085 commented Jun 3, 2016

@marcelstoer will do. Any idea when dev will be upgraded to SDK 1.5.4? there is a new function wifi_fpm_auto_sleep_set_in_null_mode that would be nice to add.

@marcelstoer
Copy link
Member

No idea, there's no issue for that and I'm not aware of anyone working on that secretly. You're probably aware that NodeMCU is moving to RTOS-SDK but I'm not involved because I lack the required know-how.

@dnc40085
Copy link
Contributor Author

dnc40085 commented Jun 4, 2016

Oh I see, never mind then, I'll save that feature for a future PR.
I'll try to get this PR ready in the next day or two.

@dnc40085
Copy link
Contributor Author

@jmattsson I've almost got this PR ready but I'm having a problem getting the sleep function to play nicely with the UART.

In the course of development of this feature, I have been using the WiFi event monitor to print event messages and occasionally it goes to sleep while the uart is still transmitting, leaving the TX pin low and the TX led on, then when I wake it up it finishes emptying the buffer.

I was thinking maybe I could add a TX active flag to uart_tx_one_char(app/driver/uart.c) so I could wait for the TX buffer to empty before going to sleep.

Any thoughts?

@mjmcginty
Copy link

@dnc40085 It seems reasonable to me to leave it to the user, won't we need to stop timers and such anyway? Not only that, anything capable of drawing current through or controlled by an i/o will have to be managed, otherwise the point of sleeping is negated.

The obvious use case to me is to wait until any external activity has stopped before sleeping... in fact I can't think of a desirable or even acceptable use case where an app would want to sleep while anything was going on. (That doesn't mean there isn't one, just that it defies my imagination.)

In any case, I'm down with having to curtail serial output prior to making this call, just part of necessary prep. And the worst consequence for failing to do so does not seem dire.

My 2 cents... well that might be a nickel's worth -- keep the change. :-) And thanks for your efforts.

@jmattsson
Copy link
Member

The wifi suspend is impacting the UART?! That sounds like the WiFi suspend is shutting down the auxillary/peripheral clock altogether then >.< I imagine that would impact some other things as well then (SPI, etc)

Like @mjmcginty says, this might be something best handled with lots of user documentation. Maybe I should poke @devsaurus for his opinion here as well?

@dnc40085
Copy link
Contributor Author

won't we need to stop timers and such anyway?

As far as I know, the timers pick up where they left off when ESP comes out of light sleep.

The obvious use case to me is to wait until any external activity has stopped before sleeping

That is what I'm trying to do, but I can't find any functions in the uart driver that tell me whether or not the uart is busy.

@mjmcginty
Copy link

That is what I'm trying to do, but I can't find any functions in the uart driver that tell me whether or not the uart is busy.

It's impossible to know what async external input sources are about to send, the best that can be done is to infer an idle state from a period of inactivity.

I suspect you ran into this because status change events caused while setting up to sleep, generated uart output as you were calling it. With all due respect and appreciation, you might be over-thinking it just a bit.

IMHO, in this scenario an amount of time elapsed with no activity is a definitive measure of 'not busy', and activity is in the eye of application code.

@dnc40085
Copy link
Contributor Author

wifi suspend is impacting the UART?!

Light sleep not modem sleep. wifi sleep does not affect uart, it just disables wifi.
I'm having problems with light sleep (cpu suspend) not allowing uart to empty it's buffer before going to sleep.

@mjmcginty
Copy link

My experiences in pickup bars tell me that displays of urgency and desperation have the opposite effect. However, the feature has not been present in previous versions, so zero worry of impact to existing code, and I would be happy to have it even if it dumped the uart buffer and filled it with garbage on wake up! :-)

@mjmcginty
Copy link

@dnc40085 any progress?

@marcelstoer marcelstoer mentioned this pull request Jul 1, 2016
@dnc40085
Copy link
Contributor Author

dnc40085 commented Jul 2, 2016

@mjmcginty My apologies for the delay, I had to go out of town and haven't had access to an ESP for testing.
I'll try to get the code cleaned up and update this PR as soon as possible.

@mjmcginty
Copy link

mjmcginty commented Jul 3, 2016

@dnc40085 awesome, thank you! I came up with a little thing that lets you test an ESP-12 without soldering it at all, just got the boards from being printed yesterday. (I think I have all the components on hand, might have to order some headers from Digikey.) How can I get one to you?

@marcelstoer
Copy link
Member

I'll try to get the code cleaned up and update this PR as soon as possible.

I created a new milestone for the next master drop. Let me know whether this one should be added to it or not.

@dnc40085
Copy link
Contributor Author

dnc40085 commented Jul 17, 2016

Let me know whether this one should be added to it or not.

I'd say go ahead, but I keep randomly getting an error that simply says fpm 737 followed by a watchdog reset(extended boot reason: 3).

my init.lua can be found here

Normal:

NodeMCU 1.5.4 build unspecified powered by Lua 5.1.4 on SDK 1.5.4(baaeaebb)
> 
    STA - CONNECTED
    SSID: myssid
    BSSID: xx:xx:xx:xx:xx:xx
    Channel: 11


    STA - GOT IP
    IP: 10.0.0.5
    MASK: 255.255.255.0
    GW: 10.0.0.1

node.sleep({wake_gpio=4})
> 
    STA - DISCONNECTED
    SSID: myssid
    BSSID: 5���```3:xx:xx:xx:xx  << see NOTE at bottom of post
    reason: 8


    STA - CONNECTED
    SSID: myssid
    BSSID: xx:xx:xx:xx:xx:xx
    Channel: 11


    STA - GOT IP
    IP: 10.0.0.5
    MASK: 255.255.255.0
    GW: 10.0.0.1

abnormal:

NodeMCU 1.5.4 build unspecified powered by Lua 5.1.4 on SDK 1.5.4(baaeaebb)
> 
    STA - CONNECTED
    SSID: myssid
    BSSID: xx:xx:xx:xx:xx:xx
    Channel: 11


    STA - GOT IP
    IP: 10.0.0.5
    MASK: 255.255.255.0
     GW: 10.0.0.1

node.sleep({wake_gpio=4})
fpm 737

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x40100000, len 25576, room 16 
tail 8
chksum 0x84
load 0x3ffe8000, len 2204, room 0 
tail 12
chksum 0x75
ho 0 tail 12 room 4
load 0x3ffe889c, len 8, room 12 
tail 8
chksum 0x91
csum 0x91
��rll���l`���r�l�l��l`���rl��
NodeMCU 1.5.4 build unspecified powered by Lua 5.1.4 on SDK 1.5.4(baaeaebb)
> 
    STA - CONNECTED
    SSID: myssid
    BSSID: xx:xx:xx:xx:xx:xx
    Channel: 11


    STA - GOT IP
    IP: 10.0.0.5
    MASK: 255.255.255.0
    GW: 10.0.0.1

I'm not sure what is causing this crash~~,but I think it might be caused by the uart, because I never get this problem when the event monitor is not running.~~

NOTE: This is the uart glitch i was referring to in my previous post. in this instance the ESP8266 enters LIGHT_SLEEP after the 5 is printed, the remaining chars are printed upon cpu resume.

@dnc40085 dnc40085 changed the title Add wifi.suspend() function to wifi module Add wifi.suspend() and node.sleep() Jul 17, 2016
@dnc40085
Copy link
Contributor Author

dnc40085 commented Jul 20, 2016

Please disregard the parts of my previous post concerning the error fpm 737. the bug was introduced with SDK 1.5.4 and should be fixed with the merge of PR #1390.

As for the above mentioned UART glitch, I did some testing with an ammeter and the glitch is causing an increase in current draw when the esp enters light sleep, causing the sleep current to be different each time the ESP8266 enters light sleep, making it difficult to estimate how long it would last on battery.

I was thinking maybe a function or flag could be added to allow suspension of the UART output before executing wifi_fpm_do_sleep or maybe something else that would result in the UART being in a predictable state when entering light sleep.

Any suggestions would be much appreciated.

@marcelstoer
Copy link
Member

This was almost done but has been sitting idle for too long. Is there consensus whether it should be merged (once rebased of course) even with the UART glitch?

@dnc40085
Copy link
Contributor Author

dnc40085 commented Nov 18, 2016

I thought I was almost done with it, but then I ran into a problem with some timers preventing timed light sleep.

To remedy the timer problem, I've implemented the ability to suspend timers.
This is accomplished by keeping a registry of currently armed timers that is maintained by a couple of macros in sdk-overrides/osapi.h that override os_timer_arm() and os_timer_disarm().
Using this registry, all non SDK timers can be kept track of to be later suspended and then resumed as needed.

I just got the timer suspend/resume functionality working properly a few days ago, and I'm currently cleaning up my mess to get it ready for merging.

As for the timer suspend add-on, since it is required by node.sleep(), should I just add it to this PR or should it get a PR of it's own?

My apologies for taking so long with this PR, it's been a real head scratcher.

@dnc40085
Copy link
Contributor Author

@marcelstoer thank you for letting me know

@dnc40085 dnc40085 force-pushed the dev_wifi_suspend branch 2 times, most recently from 0d0d3ce to 210ae98 Compare January 25, 2017 12:17
@dnc40085
Copy link
Contributor Author

I believe I've fixed the UART glitch, I just needed to check the TX buffer to ensure it was empty before issuing the sleep command.

This PR now passes all of my personal tests.

Added timer suspend functionality

Added functions:
* wifi.suspend
* wifi.resume
* node.sleep 
* tmr.suspend
* tmr.suspend_all
* tmr.resume
* tmr.resume_all
@marcelstoer marcelstoer added this to the 2.0.0-follow-up milestone Apr 4, 2017
@marcelstoer marcelstoer merged commit 41a5736 into nodemcu:dev Apr 4, 2017
@dnc40085
Copy link
Contributor Author

dnc40085 commented Apr 5, 2017

@marcelstoer I found some errors in the documentation that need to be fixed before merge to the master branch, I will be submitting a PR shortly.

@TerryE
Copy link
Collaborator

TerryE commented Apr 5, 2017

@dnc40085, It might be worth a comment in the documentation for wifi.suspend() that if you do need to do a long running task -- one that is going to run over the 15 mSec guideline (for example an OW search enumeration) that you suspend the wifi stack during this operation.

One other point which "advanced" programmers understand but will confuse most beginners: your requesting routine has to return / exit after executing the suspend request, because the suspend isn't actioned by the SDK until you've returned control to it. It will then invoke the suspend callback once the suspend activities are completed. Ditto resume.

@dnc40085
Copy link
Contributor Author

dnc40085 commented Apr 6, 2017

@TerryE I did mention the 500 ms guideline in wifi.suspend(), do you think it would be better change it to 15 ms and move the notice to the beginning of wifi.md for more exposure?

example:

  • The wifi subsystem is maintained by background tasks that must run periodically, any function or task that takes longer than 15 ms (miliseconds) may cause the wifi subsystem to crash.
  • To avoid these potential crashes, it is advised that the wifi subsystem be suspended with wifi.suspend() prior to the execution of any tasks or functions that exceed this 15 ms guideline.

One other point which "advanced" programmers understand but will confuse most beginners: your requesting routine has to return / exit after executing the suspend request, because the suspend isn't actioned by the SDK until you've returned control to it. It will then invoke the suspend callback once the suspend activities are completed. Ditto resume.

I'm not sure of the best way to document this...
Do you think this would work?

wifi.suspend()

Suspend Wifi to reduce current consumption.

Please note:

  • Wifi will not enter a suspended state until cpu is idle.
  • Suspend callback will only execute after wifi has successfully entered a suspended state.

syntax

...

@dnc40085 dnc40085 deleted the dev_wifi_suspend branch April 7, 2017 01:06
@dnc40085 dnc40085 mentioned this pull request Apr 15, 2017
2 tasks
@TerryE
Copy link
Collaborator

TerryE commented Jul 9, 2017

@dnc40085, I am working through some issues on my Lua 5.3 port and this has bumped my against your mods here. My concern isn't with the functionality, but with the implementation strategy. We are building and using a Lua RTS which has its own embedded memory management systems and sets of structures and functions for array / list management. Yet, I reviewing this implementation in depth, I realise that:

  • You are not using the Lua memory management system , but instead going to the underlying malloc / free system. One of the main differences between these is that memory exhaustion in the Lua system will trigger a GC sweep followed by a retry and failure will then throw an out of memory error. This means that calling code doesn't need to have error paths for out-of-memory as this is a thrown condition. Your code falls into the classic pattern of the malloc approach of rolling up out of memory errors up the return stack assuming that all higher level code will test functions for out of memory. So you timer code has lots of tests for out-of-memory returns and throwing errors.

  • You've added a new and separate module for handling dynamic arrays, which is different to the standard Lua one and only used by this application.

This all amounts to a future maintenance burden for quite a narrow albeit very useful functionality. I just feel that the whole implementation would have been simpler and cleaner if you'd stuck to the Lua way for implementing new functionality for a Lua module.

@dnc40085
Copy link
Contributor Author

dnc40085 commented Jul 9, 2017

@TerryE Funny you mention this, lately I've been thinking of updating the implementation to use the Lua memory management system and since implementation I've wondered if there was a simpler way I could implement the dynamic array.

I've been looking at the functions in app/lua/lmem.h and I assume luaM_malloc() and luaM_free() are pretty much drop-in replacements for c_malloc() and c_free() and I think I can use luaM_newvector(), luaM_growvector() and luaM_reallocvector() to replace the dynamic array functions but lmem.h isn't very clear on what goes in each functions parameters and so far I haven't been able to find any documentation on these functions.

I'll get to work on a remedy for this issue and try and get it done in a timely manner.

@TerryE
Copy link
Collaborator

TerryE commented Jul 10, 2017

@dnc40085, a slight aside: before I got involved with this Lua project, I was interested in and contributed to some of the core components of the PHP system, and it is interesting to compare and contrast the two. The reason that we can use Lua and not PHP as an embedded RTS on an IoT device is that the Lua architecture is tight and normalised. In essence there is only one way of doing anything in Lua and the authors use that way wherever they need it. By resisting the urge to implement a special approach to optimise a specific case, the overall consequence is that the whole remains lean and mean.

So to your Q about how to implement dynamic arrays, my answer is simple: use the Lua way. Don't reinvent your own. So in this case use its API for array handling.

But also why not follow the general Lua approach for handling exceptional error conditions which is to use the lua error system to throw the error where it occurs and avoid rolling this back up the call stack. If errors are thrown then code can be written assuming that any called function has done the intended purpose so the implementation is simpler, faster and easier to understand and maintain.

And another principle, that I would also suggest for changes to an embedded RTS for an IoT: stick to the minimal implementation of any aspect of any new functional requirement unless there are compelling advantages for generalising or extending the scope.

If I go back to the i original scope of this patch it was to implement a light-sleep and wake function, so that a class of Lua implementations (ones which spent a lot of time dormant and only occasionally needed to do processing) could save power, but to do this using a simple API and without crashing the application.

This last point was a problem, and the specific issue was that the SDK, services and Lua applications use the os_timer services to schedule future task callbacks. This service (as at 1.5.4 and to date, I believe) is not light-sleep aware, and because light sleep turns off the system clock and suspends the CPU, the future booking system slips, so we need to offset any booked times by the period that the system clock was stopped. Surely that was the core issue that we needed to address?

I personally think that the introduction of 3 new subdirectories (pm, swTimer and misc) with modules in them together with the extra API functions exposed in tmr, was just overkill here. In terms of the tmr functions, how could a Lua developer sensibly use these? So why expose them?


On another topic relating to this PR, the Expressif SDK internals are not well documented and the source isn't available , but it has some fundamental characteristics:

  • Processing is either (H/W) interrupt (ISRs) or background.
  • ISRs are short and sharp, the guideline is 50µS. If you need to do anything, you must use system_os_post to post a task to do it.
  • All background processing (anything other than an ISR) is divided into tasks. Tasks once started run to completion and are not preempted by another task.
  • All tasks return to a core SDK scheduler which maintains 31 priority task lists, (three of which are exposed to the user apps through the system_os_task() API call) and the at each scheduling point the highest priority os_task_t with a non-empty task queue is executed.

I am note sure how the os_timer_* interface play into this, but it does follow the same constraints so you don't need to mask interrupts, etc whilst working with the timer Qs since these are only manipulated in a task.

Looking at the libmain.a binary, the timer system uses the FRC2 HW timers plus its own ISR to schedule its own events. The H/W timer is armed for the first coming s/w timer due; when it fires it walks down the head of the timer event list to post and due S/W timer its own priority queue (not one of the three user ones). The timer queue task handler then delivers the timer callback, thus s/w timer alarm are posted at this timer priority rather than one of the three user ones.

To be honest, I get lost in swTimer.c. I can see that you walk the timer_registry to do the suspends, but I can't see how you mine the timer_list to initialise the registry in the first place.

My Q / suggestion is that if you treated your current implementation as a prototype which included your process of discovering how this all worked, but we then wanted to strip this back to the essential functions of "checkpoint all timers" and "restore all timers offset by time that the system clock was stopped", then how would you do this? Can we just detach the ETStimer linked list from timer_list, or do we walk this list and only disarm (user?) callbacks caching the previous expiry time? If we cache this data, then we don't need any dynamic array handling since it is trivial to do a two pass scan of the timer_list to count the number of contexts that we need to save and do a single luaM_malloc of this structure.

I suggest that if you removed all one the non-core stuff, then we would up with a patch less than a third of the current one.

//Terry

@TerryE
Copy link
Collaborator

TerryE commented Jul 11, 2017

OK, I've discovered the overrides of os_timer_arm() and os_timer_disarm() in sdk-overrides/include/osapi.h. I agree that this simplifies your implementation and makes it more transparent to the modules, but you've also added side-effects and silent failure modes in doing this. For example these hidden calls to swtmr register / unregister do malloc / free calls and these can result in errors returned in a status (rather than thrown) and your wrapper ignores these error conditions.

Why not just walk the timer_list chain to work out what needs to be unregistered? If you don't want to start unregistering SDK timers then

  • why not just establish a convention that for modules to be light sleep compliant then any active cb's need to registered in the Lua registry as a lightweight C function? Or
  • have a (user_config.h defined) statically allocated max number of timer callbacks that you can track (this is the ETSTimerFunc that gets registered with the setfn (tmr.c has two: alarm_timer_common and rtc_callback) so a 10 × (void *) allocation would probably cover most applications. And throw an error in the unlikely event that the app uses more than the allocated number so the developer knows that the number needs upped for this application.

Either way you can now scan the timer_list chain during pmSleep to work out which tiimers to disarm and save the context.

eiselekd pushed a commit to eiselekd/nodemcu-firmware that referenced this pull request Jan 7, 2018
* Exposed forced sleep API and more
Added timer suspend functionality
* wifi.suspend
* wifi.resume
* node.sleep 
* tmr.suspend
* tmr.suspend_all
* tmr.resume
* tmr.resume_all
* Implement timer suspend functionality
* Fix for uart TX glitch
* Made some modifications to the error reporting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants