Breaking Changes in v1.3: Breaking changes were made in v1.3 to reduce the
flash memory consumption of Coroutine
instances by 800-1000 bytes. See the
CHANGELOG.md for a complete list.
A low-memory, fast-switching, cooperative multitasking library using stackless coroutines on Arduino platforms.
This library is an implementation of the
ProtoThreads library for the
Arduino platform. It emulates a stackless coroutine that can suspend execution
using a yield()
or delay()
functionality to allow other coroutines to
execute. When the scheduler makes its way back to the original coroutine, the
execution continues right after the yield()
or delay()
.
There are only 3 classes in this library:
Coroutine
class provides the context variables for all coroutines,CoroutineScheduler
class optionally handles the scheduling,Channel
class allows coroutines to send messages to each other. This is an experimental feature whose API and feature may change considerably in the future.
The library provides a number of macros to help create coroutines and manage their life cycle:
COROUTINE()
: defines an instance of theCoroutine
class or an instance of a user-defined subclass ofCoroutine
COROUTINE_BEGIN()
: must occur at the start of a coroutine bodyCOROUTINE_END()
: must occur at the end of the coroutine bodyCOROUTINE_YIELD()
: yields execution back to the caller, oftenCoroutineScheduler
but not necessarilyCOROUTINE_AWAIT(condition)
: yield untilcondition
becomestrue
COROUTINE_DELAY(millis)
: yields back execution formillis
. Themillis
parameter is defined as auint16_t
.COROUTINE_DELAY_MICROS(micros)
: yields back execution formicros
. Themicros
parameter is defined as auint16_t
.COROUTINE_DELAY_SECONDS(seconds)
: yields back execution forseconds
. Theseconds
parameter is defined as auint16_t
.COROUTINE_LOOP()
: convenience macro that loops foreverCOROUTINE_CHANNEL_WRITE(channel, value)
: writes a value to aChannel
COROUTINE_CHANNEL_READ(channel, value)
: reads a value from aChannel
Here are some of the compelling features of this library compared to others (in my opinion of course):
- low memory usage
- 8-bit (e.g. AVR) processors:
- the first
Coroutine
consumes about 230 bytes of flash - each additional
Coroutine
consumes 170 bytes of flash - each
Coroutine
consumes 11 bytes of static RAM CoroutineScheduler
consumes only about 40 bytes of flash and 2 bytes of RAM independent of the number of coroutines
- the first
- 32-bit (e.g. STM32, ESP8266, ESP32) processors
- the first
Coroutine
consumes between 120-450 bytes of flash - each additional
Coroutine
consumes about 130-160 bytes of flash, - each
Coroutine
consumes 20 bytes of static RAM CoroutineScheduler
consumes only about 40-60 bytes of flash and 4 bytes of static RAM independent of the number of coroutines
- the first
- 8-bit (e.g. AVR) processors:
- extremely fast context switching
- Direct Scheduling (call
Coroutine::runCoroutine()
directly)- ~1.2 microseconds on a 16 MHz ATmega328P
- ~0.4 microseconds on a 48 MHz SAMD21
- ~0.3 microseconds on a 72 MHz STM32
- ~0.3 microseconds on a 80 MHz ESP8266
- ~0.1 microseconds on a 240 MHz ESP32
- ~0.17 microseconds on 96 MHz Teensy 3.2 (depending on compiler settings)
- Coroutine Scheduling (use
CoroutineScheduler::loop()
):- ~5.5 microseconds on a 16 MHz ATmega328P
- ~1.3 microseconds on a 48 MHz SAMD21
- ~0.9 microseconds on a 72 MHz STM32
- ~0.6 microseconds on a 80 MHz ESP8266
- ~0.2 microseconds on a 240 MHz ESP32
- ~0.5 microseconds on 96 MHz Teensy 3.2 (depending on compiler settings)
- Direct Scheduling (call
- uses the computed goto
feature of the GCC compiler (also supported by Clang) to avoid the
Duff's Device hack
- allows
switch
statements in the coroutines
- allows
- C/C++ macros eliminate boilerplate code and make the code easy to read
- the base
Coroutine
class is easy to subclass to add additional variables and functions - fully unit tested using AUnit
Some limitations are:
- A
Coroutine
cannot return any values. - A
Coroutine
is stackless and therefore cannot preserve local stack variables across multiple calls. Often the class member variables or function static variables are reasonable substitutes. - Coroutines are designed to be statically allocated, not dynamically created
and destroyed on the heap. Dynamic memory allocation on an 8-bit
microcontroller with 2kB of RAM would cause too much heap fragmentation. And
the virtual destructor pulls in
malloc()
andfree()
which increases flash memory by 600 bytes on AVR processors. - A
Channel
is an experimental feature and has limited features. It is currently an unbuffered, synchronized channel. It can be used by only one reader and one writer.
After I had completed most of this library, I discovered that I had essentially
reimplemented the <ProtoThread.h>
library in the
Cosa framework. The difference is that
AceRoutine is a self-contained library that works on any platform supporting the
Arduino API (AVR, Teensy, ESP8266, ESP32, etc), and it provides a handful of
additional macros that can reduce boilerplate code.
Version: 1.4.2 (2022-02-04)
Changelog: CHANGELOG.md
- Hello Coroutines
- Installation
- Documentation
- Comparisons
- Resource Consumption
- System Requirements
- License
- Feedback and Support
- Authors
This is the HelloCoroutine.ino sample sketch which
uses the COROUTINE()
macro to automatically handle a number of boilerplate
code, and some internal bookkeeping operations. Using the COROUTINE()
macro
works well for relatively small and simple coroutines.
#include <AceRoutine.h>
using namespace ace_routine;
const int LED = LED_BUILTIN;
const int LED_ON = HIGH;
const int LED_OFF = LOW;
COROUTINE(blinkLed) {
COROUTINE_LOOP() {
digitalWrite(LED, LED_ON);
COROUTINE_DELAY(100);
digitalWrite(LED, LED_OFF);
COROUTINE_DELAY(500);
}
}
COROUTINE(printHelloWorld) {
COROUTINE_LOOP() {
Serial.print(F("Hello, "));
Serial.flush();
COROUTINE_DELAY(1000);
Serial.println(F("World"));
COROUTINE_DELAY(4000);
}
}
void setup() {
delay(1000);
Serial.begin(115200);
while (!Serial); // Leonardo/Micro
pinMode(LED, OUTPUT);
}
void loop() {
blinkLed.runCoroutine();
printHelloWorld.runCoroutine();
}
The printHelloWorld
coroutine prints "Hello, ", waits 1 second, then prints
"World", then waits 4 more seconds, then repeats from the start. At the same
time, the blinkLed
coroutine blinks the builtin LED on and off, on for 100 ms
and off for 500 ms.
The HelloScheduler.ino sketch implements the same
thing using the CoroutineScheduler
:
#include <AceRoutine.h>
using namespace ace_routine;
... // same as above
void setup() {
delay(1000);
Serial.begin(115200);
while (!Serial); // Leonardo/Micro
pinMode(LED, OUTPUT);
CoroutineScheduler::setup();
}
void loop() {
CoroutineScheduler::loop();
}
The CoroutineScheduler
can automatically manage all coroutines defined by the
COROUTINE()
macro, which eliminates the need to itemize your coroutines in the
loop()
method manually. Unfortunately, this convenience is not free (see
MemoryBenchmark):
- The
CoroutineScheduler
singleton instance increases the flash memory by about 110 bytes. - The
CoroutineScheduler::loop()
method calls theCoroutine::runCoroutine()
method through thevirtual
dispatch instead of directly, which is slower and takes more flash memory. - Each
Coroutine
instance consumes an additional ~70 bytes of flash when using theCoroutineScheduler
.
On 8-bit processors with limited memory, the additional resource consumption can
be important. On 32-bit processors with far more memory, these additional
resources are often inconsequential. Therefore the CoroutineScheduler
is
recommended mostly on 32-bit processors.
The HelloManualCoroutine.ino program shows what
the code looks like without the convenience of the COROUTINE()
macro. For more
complex programs, with more than a few coroutines, especially if the coroutines
need to communicate with each other, this coding structure can be more powerful.
#include <Arduino.h>
#include <AceRoutine.h>
using namespace ace_routine;
const int LED = LED_BUILTIN;
const int LED_ON = HIGH;
const int LED_OFF = LOW;
class BlinkLedCoroutine: public Coroutine {
public:
int runCoroutine() override {
COROUTINE_LOOP() {
digitalWrite(LED, LED_ON);
COROUTINE_DELAY(100);
digitalWrite(LED, LED_OFF);
COROUTINE_DELAY(500);
}
}
};
class PrintHelloWorldCoroutine: public Coroutine {
public:
int runCoroutine() override {
COROUTINE_LOOP() {
Serial.print(F("Hello, "));
Serial.flush();
COROUTINE_DELAY(1000);
Serial.println(F("World"));
COROUTINE_DELAY(4000);
}
}
};
BlinkLedCoroutine blinkLed;
PrintHelloWorldCoroutine printHelloWorld;
void setup() {
delay(1000);
Serial.begin(115200);
while (!Serial); // Leonardo/Micro
pinMode(LED, OUTPUT);
}
void loop() {
blinkLed.runCoroutine();
printHelloWorld.runCoroutine();
}
The latest stable release is available in the Arduino IDE Library Manager. Only a single library needs to be installed since v1.1:
- Search for "AceRoutine". Click Install.
The direct dependency to the AceCommon
library was removed in v1.4.2, but some of the programs under tests/
and
examples/
may still require the AceCommon
library to be installed.
The development version can be installed by cloning the following git repo:
- AceRoutine (https://github.com/bxparks/AceRoutine)
You can copy this directory to the ./libraries
directory used by the
Arduino IDE. (The result is a directory named ./libraries/AceRoutine
). Or you
can create symlinks from /.libraries
to this directory.
The develop
branch contains the latest working version.
The master
branch contains the stable release.
The source files are organized as follows:
src/AceRoutine.h
- main header filesrc/ace_routine/
- implementation filessrc/ace_routine/testing/
- internal testing filestests/
- unit tests which depend on AUnitexamples/
- example programs
- README.md - this file
- Doxygen docs published on GitHub Pages
- USER_GUIDE.md
The following programs are provided under the examples
directory:
- Beginner Examples
- HelloCoroutine.ino
- HelloScheduler.ino: same as
HelloCoroutine
except using theCoroutineScheduler
instead of manually running the coroutines - HelloManualCoroutine.ino: same as
HelloCoroutine
except theCoroutine
subclasses and instances are created and registered manually
- Intermediate Examples
- BlinkSlowFastRoutine.ino: use coroutines to read a button and control how the LED blinks
- BlinkSlowFastManualRoutine.ino:
same as BlinkSlowFastRoutine but using manual
Coroutine
subclasses - CountAndBlink.ino: count and blink at the same time
- Delay.ino: validate the
COROUTINE_DELAY()
macro
- Advanced Examples
- SoundManager: Use a sound manager coroutine to
control the sounds made by a sound generator coroutine, using the
reset()
function to interrupt the sound generator.
- SoundManager: Use a sound manager coroutine to
control the sounds made by a sound generator coroutine, using the
- Channels (experimental)
- Pipe.ino: uses a
Channel
to allow a Writer to send messages to a Reader through a "pipe" (unfinished) - Task.ino: uses a
Channel
to allow a Writer to send messages to a Reade (unfinished) - a working example of Channels can be found in the CommandLineInterface package in the AceUtils library (https://github.com/bxparks/AceUtils).
- Pipe.ino: uses a
- Benchmarks
- Internal programs to extract various CPU and memory benchmarks.
- AutoBenchmark.ino: performs CPU benchmarking
- MemoryBenchmark.ino: determines the flash and static memory consumptions of certain AceRoutine features
- ChannelBenchmark.ino: determines the amount
of CPU overhead of a
Channel
by using 2 coroutines to ping-pong an integer across 2 channels
There are several interesting and useful multithreading libraries for Arduino. I'll divide the libraries in to 2 camps:
- tasks
- threads or coroutines
Task managers run a set of tasks. They do not provide a way to resume
execution after yield()
or delay()
.
In order of increasing complexity, here are some libraries that provide broader abstraction of threads or coroutines:
- Littlebits coroutines
- Implemented using Duff's Device which means that nested
switch
statements don't work. - The scheduler has a fixed queue size.
- The context structure is exposed.
- Implemented using Duff's Device which means that nested
- Arduino-Scheduler
- Overrides the system's
yield()
for a seamless experience. - Uses
setjmp()
andlongjmp()
. - Provides an independent stack to each coroutine whose size is configurable at runtime (defaults to 128 for AVR, 1024 for 32-bit processors).
- ESP8266 or ESP32 not supported (or at least I did not see it).
- Overrides the system's
- Cosa framework
- A full-featured, alternative development environment using the Arduino IDE, but not compatible with the Arduino API or libraries.
- Installs as a separate "core" using the Board Manager.
- Includes various ways of multi-tasking (Events, ProtoThreads, Threads, Coroutines).
- The
<ProtoThread.h>
library in the Cosa framework uses basically the same technique as thisAceRoutine
library.
The AceRoutine library falls in the "Threads or Coroutines" camp. The
inspiration for this library came from
ProtoThreads and
Coroutines in C
where an incredibly brilliant and ugly technique called
Duff's Device
is used to perform labeled goto
statements inside the "coroutines" to resume
execution from the point of the last yield()
or delay()
. It occurred to me
that I could make the code a lot cleaner and easier to use in a number of ways:
- Instead of using Duff's Device, I could use the GCC language extension
called the
computed goto.
I would lose ANSI C compatbility, but all of the Arduino platforms
(AVR, Teensy, ESP8266, ESP32) use the GCC compiler and the Arduino
software already relies on GCC-specific features (e.g. flash strings using
PROGMEM
attribute). In return,switch
statements would work inside the coroutines, which wasn't possible using the Duff's Device. - Each "coroutine" needs to keep some small number of context variables.
In the C language, this needs to be passed around using a
struct
. It occurred to me that in C++, we could make the context variables almost disappear by making "coroutine" an instance of a class and moving the context variables into the member variables. - I could use C-processor macros similar to the ones used in AUnit to hide much of the boilerplate code and complexity from the user
I looked around to see if there already was a library that implemented these
ideas and I couldn't find one. However, after writing most of this library, I
discovered that my implementation was very close to the <ProtoThread.h>
module
in the Cosa framework. It was eerie to see how similar the 2 implementations had
turned out at the lower level. I think the AceRoutine library has a couple of
advantages:
- it provides additional macros (i.e.
COROUTINE()
andEXTERN_COROUTINE()
) to eliminate boilerplate code, and - it is a standalone Arduino library that does not depend on a larger framework.
All objects are statically allocated (i.e. not heap or stack).
On 8-bit processors (AVR Nano, Uno, etc):
sizeof(Coroutine): 11
sizeof(CoroutineScheduler): 2
sizeof(Channel<int>): 5
On 32-bit processors (e.g. Teensy ARM, ESP8266, ESP32):
sizeof(Coroutine): 20
sizeof(CoroutineScheduler): 4
sizeof(Channel<int>): 12
The CoroutineScheduler
consumes only 2 bytes (8-bit processors) or 4 bytes
(32-bit processors) of static memory no matter how many coroutines are created.
That's because it depends on a singly-linked list whose pointers live on the
Coroutine
object, not in the CoroutineScheduler
. But using the
CoroutineScheduler::loop()
instead of calling Coroutine::runCoroutine()
directly increases flash memory usage by 70-100 bytes.
The Channel
object requires 2 copies of the parameterized <T>
type so its
size is equal to 1 + 2 * sizeof(T)
, rounded to the nearest memory alignment
boundary (i.e. a total of 12 bytes for a 32-bit processor).
The examples/MemoryBenchmark program gathers flash and memory consumption numbers for various boards (AVR, ESP8266, ESP32, etc) for a handful of AceRoutine features. Here are some highlights:
Arduino Nano (8-bits)
+--------------------------------------------------------------------+
| functionality | flash/ ram | delta |
|---------------------------------------+--------------+-------------|
| Baseline | 606/ 11 | 0/ 0 |
|---------------------------------------+--------------+-------------|
| One Delay Function | 654/ 13 | 48/ 2 |
| Two Delay Functions | 714/ 15 | 108/ 4 |
|---------------------------------------+--------------+-------------|
| One Coroutine | 844/ 32 | 238/ 21 |
| Two Coroutines | 1016/ 51 | 410/ 40 |
|---------------------------------------+--------------+-------------|
| One Coroutine (micros) | 816/ 32 | 210/ 21 |
| Two Coroutines (micros) | 960/ 51 | 354/ 40 |
|---------------------------------------+--------------+-------------|
| One Coroutine (seconds) | 944/ 32 | 338/ 21 |
| Two Coroutines (seconds) | 1148/ 51 | 542/ 40 |
|---------------------------------------+--------------+-------------|
| Scheduler, One Coroutine | 968/ 34 | 362/ 23 |
| Scheduler, Two Coroutines | 1132/ 53 | 526/ 42 |
|---------------------------------------+--------------+-------------|
| Scheduler, One Coroutine (micros) | 940/ 34 | 334/ 23 |
| Scheduler, Two Coroutines (micros) | 1076/ 53 | 470/ 42 |
|---------------------------------------+--------------+-------------|
| Scheduler, One Coroutine (seconds) | 1068/ 34 | 462/ 23 |
| Scheduler, Two Coroutines (seconds) | 1264/ 53 | 658/ 42 |
|---------------------------------------+--------------+-------------|
| Scheduler, One Coroutine (setup) | 1018/ 34 | 412/ 23 |
| Scheduler, Two Coroutines (setup) | 1282/ 53 | 676/ 42 |
|---------------------------------------+--------------+-------------|
| Scheduler, One Coroutine (man setup) | 996/ 34 | 390/ 23 |
| Scheduler, Two Coroutines (man setup) | 1268/ 53 | 662/ 42 |
|---------------------------------------+--------------+-------------|
| Blink Function | 938/ 14 | 332/ 3 |
| Blink Coroutine | 1158/ 32 | 552/ 21 |
+--------------------------------------------------------------------+
ESP8266 (32-bits)
+--------------------------------------------------------------------+
| functionality | flash/ ram | delta |
|---------------------------------------+--------------+-------------|
| Baseline | 260329/27916 | 0/ 0 |
|---------------------------------------+--------------+-------------|
| One Delay Function | 260377/27916 | 48/ 0 |
| Two Delay Functions | 260441/27916 | 112/ 0 |
|---------------------------------------+--------------+-------------|
| One Coroutine | 260525/27944 | 196/ 28 |
| Two Coroutines | 260669/27960 | 340/ 44 |
|---------------------------------------+--------------+-------------|
| One Coroutine (micros) | 260541/27944 | 212/ 28 |
| Two Coroutines (micros) | 260701/27960 | 372/ 44 |
|---------------------------------------+--------------+-------------|
| One Coroutine (seconds) | 260541/27944 | 212/ 28 |
| Two Coroutines (seconds) | 260717/27960 | 388/ 44 |
|---------------------------------------+--------------+-------------|
| Scheduler, One Coroutine | 260573/27944 | 244/ 28 |
| Scheduler, Two Coroutines | 260701/27968 | 372/ 52 |
|---------------------------------------+--------------+-------------|
| Scheduler, One Coroutine (micros) | 260589/27944 | 260/ 28 |
| Scheduler, Two Coroutines (micros) | 260733/27968 | 404/ 52 |
|---------------------------------------+--------------+-------------|
| Scheduler, One Coroutine (seconds) | 260589/27944 | 260/ 28 |
| Scheduler, Two Coroutines (seconds) | 260749/27968 | 420/ 52 |
|---------------------------------------+--------------+-------------|
| Scheduler, One Coroutine (setup) | 260605/27944 | 276/ 28 |
| Scheduler, Two Coroutines (setup) | 260765/27968 | 436/ 52 |
|---------------------------------------+--------------+-------------|
| Scheduler, One Coroutine (man setup) | 260589/27944 | 260/ 28 |
| Scheduler, Two Coroutines (man setup) | 260749/27968 | 420/ 52 |
|---------------------------------------+--------------+-------------|
| Blink Function | 261001/27988 | 672/ 72 |
| Blink Coroutine | 261133/28008 | 804/ 92 |
+--------------------------------------------------------------------+
Comparing Blink Function
and Blink Coroutine
is probably the most
fair comparison, because they implement the exact same functionality. The code
is given in
Comparison To NonBlocking Function.
The Blink Function
implements the asymmetric blink (HIGH and LOW having
different durations) functionality using a simple, non-blocking function with an
internal prevMillis
static variable. The Blink Coroutine
implements the
same logic using a Coroutine
. The Coroutine
version is far more readable and
maintainable, with only about 220 additional bytes of flash on AVR, and 130
bytes on an ESP8266. In many situations, the increase in flash memory size may
be worth paying to get easier code maintenance.
See examples/AutoBenchmark. Here are 2 samples:
Arduino Nano:
+---------------------+--------+-------------+--------+
| Functionality | iters | micros/iter | diff |
|---------------------+--------+-------------+--------|
| EmptyLoop | 10000 | 1.700 | 0.000 |
| DirectScheduling | 10000 | 2.900 | 1.200 |
| CoroutineScheduling | 10000 | 7.200 | 5.500 |
+---------------------+--------+-------------+--------+
ESP8266:
+---------------------+--------+-------------+--------+
| Functionality | iters | micros/iter | diff |
|---------------------+--------+-------------+--------|
| EmptyLoop | 10000 | 0.100 | 0.000 |
| DirectScheduling | 10000 | 0.500 | 0.400 |
| CoroutineScheduling | 10000 | 0.900 | 0.800 |
+---------------------+--------+-------------+--------+
Tier 1: Fully Supported
These boards are tested on each release:
- Arduino Nano (16 MHz ATmega328P)
- SparkFun Pro Micro (16 MHz ATmega32U4)
- STM32 Blue Pill (STM32F103C8, 72 MHz ARM Cortex-M3)
- NodeMCU 1.0 (ESP-12E module, 80 MHz ESP8266)
- WeMos D1 Mini (ESP-12E module, 80 MHz ESP8266)
- ESP32 dev board (ESP-WROOM-32 module, 240 MHz dual core Tensilica LX6)
- Teensy 3.2 (96 MHz ARM Cortex-M4)
Tier 2: Should work
These boards should work but I don't test them as often:
- ATtiny85 (8 MHz ATtiny85)
- Arduino Pro Mini (16 MHz ATmega328P)
- Mini Mega 2560 (Arduino Mega 2560 compatible, 16 MHz ATmega2560)
- Teensy LC (48 MHz ARM Cortex-M0+)
Tier 3: May work, but not supported
- SAMD21 M0 Mini (48 MHz ARM Cortex-M0+)
- Arduino-branded SAMD21 boards use the ArduinoCore-API, so are explicitly blacklisted. See below.
- Other 3rd party SAMD21 boards may work using the SparkFun SAMD core.
- However, as of SparkFun SAMD Core v1.8.6 and Arduino IDE 1.8.19, I can no longer upload binaries to these 3rd party boards due to errors.
- Therefore, third party SAMD21 boards are now in this new Tier 3 category.
- This library may work on these boards, but I can no longer support them.
Tier Blacklisted
The following boards are not supported and are explicitly blacklisted to allow the compiler to print useful error messages instead of hundreds of lines of compiler errors:
- Any platform using the ArduinoCore-API
(https://github.com/arduino/ArduinoCore-api). For example:
- Nano Every
- MKRZero
- Raspberry Pi Pico RP2040
This library was developed and tested using:
- Arduino IDE 1.8.19
- Arduino CLI 0.19.2
- Arduino AVR Boards 1.8.4
- Arduino SAMD Boards 1.8.9
- SparkFun AVR Boards 1.1.13
- SparkFun SAMD Boards 1.8.6
- STM32duino 2.2.0
- ESP8266 Arduino 3.0.2
- ESP32 Arduino 2.0.2
- Teensyduino 1.56
This library is not compatible with:
- Any platform using the ArduinoCore-API, for example:
It should work with PlatformIO but I have not tested it.
The library works on Linux or MacOS (using both g++ and clang++ compilers) using the EpoxyDuino emulation layer.
I use Ubuntu 20.04 for the vast majority of my development. I expect that the library will work fine under MacOS and Windows, but I have not explicitly tested them.
If you have any questions, comments, or feature requests for this library, please use the GitHub Discussions for this project. If you have bug reports, please file a ticket in GitHub Issues. Feature requests should go into Discussions first because they often have alternative solutions which are useful to remain visible, instead of disappearing from the default view of the Issue tracker after the ticket is closed.
Please refrain from emailing me directly unless the content is sensitive. The problem with email is that I cannot reference the email conversation when other people ask similar questions later.
Created by Brian T. Park (brian@xparks.net).