Improved EOMtheEMSrunner #496

marcoaccame · 2024-05-24T12:00:15Z

This PR improves the activation algorithm of the RX-DO-TX loop that executes the services.
In particular in nasty cases when the phases have bursts of execution that last much longer that the allocated budget time it may happen that the regular RX-DO-TX recovers but without keeping the intended separation time between two consecutive phases.

As it is the object EOMtheEMrunner that does the job in here follows a description of its behavior, of the phoenomenon and of its remedy.

The object `EOMtheEMSrunner`

The object EOMtheEMSrunner is responsible to execute the three phases of a service offered by the ETH board: the RX, DO and TX that act in the following way:

RX collects the received data from YRI or from CAN service boards;
DO use the data, maybe to execute an outer control loop for the MC service;
TX transmits results to YRI and to CAN service boards.

These three phases must be executed at a given frequency and each one must be regular with its period and with a given time budget. The timing is configurable from xml file and we use frequency at 1 kHz and typically assign a budget of 400 us for RX, 300 us for DO and 300 us for TX.

Description of how it works now

The EOMtheEMSrunner achieves this goal using three HW timers and three dedicated threads, one for each phase. The HW timers are started with the same required frequency and each one is offset by the specified time budget. At its expiry, each HW timer sends an RTOS start signal to the thread that runs the phase.

The thread starts execution when it receives both the start signal and also an RTOS enable signal from its previous thread.

To clarify, the DO thread starts only when its HW timer emits the start signal and the it previous thread, the RX, emits the enable signal. See picture.

Figure. Activation of the RX, DO and TX phases by the two signals eENABLExx and eSTARTxx in case the durations of all the phases are withing their time budget.

The EOMtheEMSrunner also offers a service of monitoring the execution times of each phase vs its allocated budget. The duration is computed in a strict policy: it is counted starting from the start emission of the HW timer and terminates when the thread finishes.

This mechanism surely works if the budget time for each phase is higher that the effective required time, also if there are sporadic overflows of a phase into the next execution window. In such a case, the phases are activated in bursts and they realign when their execution time is finally reduced to normality. See picture.

Figure. Activation of the RX, DO and TX phases when the RX phase lasts longer. The cycle re-synchronizes. Noe that the durations are not the effective execution time rather they express the timing passed since the intended start.

The longer phases don't usually happen because the system must be designed in such a way that they do not happen in normal cases. It may however happen in the initialization phase of boards with many joints that the RX phase lasts longer, but I have always observed that the cycle recovers and the execution of the phases stays inside the intended timing window. See below.

Figure. Another example of excessive long duration of the RX phase (more than 1 ms). The cycle re-synchronizes and stays inside its timing window.

Observed misbehavior

I have recently observed on an amc board that the cycle recovered but produced an anticipation of one phase vs its intended start time. I have studies the problem and I have found out situations when that may happens. See Figure below.

Figure. In here are three cases of excessive long duration of the phases that put the cycle out of its intended timing window.

The above situation correctly executes the control and would not generate any harm to the execution of the service in the board apart a huge number of timing overflow messages. See the following situation in Figure below where we have a flood of overflow messages for the RX phase (top of Figure) and for the DO phase (bottom of Figure).

Figure. The anticipation of a phase in its timing window is likely to generate continuous reports of excessive duration time because the phase is triggered by a past HW timer expiration and the measure is the sum of the effective duration plus the delay in its activation.

Description of the modified `EOMtheEMSrunner`

This above situation may happen. It does not happen because we don't have a flood of diagnostics messages emitted by the board and if we have we try to solve it. But nevertheless we have to remove the possibility of it happening.

The cause

The cause is that there are some eSTARTxx events that are emitted in moments when they do not contribute to activate the phase and they stay active until the next eENABLExx , so the phase starts straight away even if it should start slightly later.

The remedy

The remedy is to avoid the emission of eSTARTxx that are not necessary. I have tested some algorithms and this one does the job:

Emit an eSTARTx if phase x is not in current execution and previous phase y = prev(x) is the last just finished or is currently in execution.

The following figure shows how the timing 6 achieves synchronization quite soon because of some reduced activations.

Figure. The new activation algorithm allows a quick recovery of synchronization for timing 6. Also, the measure of duration takes into account the effective time so that the focus is on the long phase only.

The tests

On a dedicated setup

I have tested both on the ems and the amc on the lego setup where I simulated and increased execution times every 1 second that generate the problem. In here is the situation with the current and with the new activation algorithm.

Figure. The new activation algorithm solves the synch on the ems when some nasty bursts of RX-DO-TX much longer than 1 ms happen.

On the robots

Together w/ @martinaxgloria I have tested the ems and mc4plus on iCubGenova11, where obviously there are no time overflows: it works fine.

We have also tested the amc con ergoCub001 and it works fine as well. It also works fine with the third motor controlled over ICC1:3 rather than with over CAN1:3.

Mergeability

After the tests we can safely merge this PR and the associated one:

ems 3.89, mc4plus 3.91, mc2plus 3.70, amc 2.40 icub-firmware-build#150

- a revised activation mode that is robust to get in synch again in presence of long durations - a different measurement of the RX-DO-TX phases that considers the effective execution time and not also the delay from the target activation of teh HW timer

… amc. enabled CANflushMODE_DO_phase for ems, mc4plus, mc2plus. increased application versions for all the above boards

marcoaccame marked this pull request as draft May 24, 2024 12:00

marcoaccame added 3 commits May 27, 2024 10:16

added a useful feature to erase the eeprom in amc board

13e8541

undefined (for now) the theRunner_USE_revised_algorithm

a6bb37e

marcoaccame force-pushed the feat/runner-improved branch from e9785a4 to a6bb37e Compare May 27, 2024 08:16

EOMtheEMSrunner.c is compiled in C++ mode also for mc4plus and mc2plus

ceb84be

marcoaccame mentioned this pull request May 27, 2024

ems 3.89, mc4plus 3.91, mc2plus 3.70, amc 2.40 robotology/icub-firmware-build#150

Merged

marcoaccame marked this pull request as ready for review May 28, 2024 12:36

enabled theRunner_USE_revised_algorithm for ems, mc4plus, mc2plus and…

0d2b1d5

… amc. enabled CANflushMODE_DO_phase for ems, mc4plus, mc2plus. increased application versions for all the above boards

marcoaccame merged commit dbc3e02 into robotology:devel May 28, 2024

marcoaccame mentioned this pull request May 29, 2024

Too many debug messages on iCubGenova11 #497

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved EOMtheEMSrunner #496

Improved EOMtheEMSrunner #496

marcoaccame commented May 24, 2024 •

edited

Loading

Improved EOMtheEMSrunner #496

Improved EOMtheEMSrunner #496

Conversation

marcoaccame commented May 24, 2024 • edited Loading

The object EOMtheEMSrunner

Description of how it works now

Observed misbehavior

Description of the modified EOMtheEMSrunner

The cause

The remedy

The tests

On a dedicated setup

On the robots

Mergeability

marcoaccame commented May 24, 2024 •

edited

Loading

The object `EOMtheEMSrunner`

Description of the modified `EOMtheEMSrunner`