Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added article about ROS 2.0 in realtime #35

Merged
merged 9 commits into from
Jul 14, 2015
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
259 changes: 259 additions & 0 deletions articles/realtime.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
---
layout: default
title: ROS 2.0 in Realtime
permalink: articles/realtime.html
abstract: This article describes real-time computing requirements and how ROS 2 will fulfill them.
published: false
author: Jackie Kay
---

* This will become a table of contents (this text will be scraped).
{:toc}

# {{ page.title }}

<div class="abstract" markdown="1">
{{ page.abstract }}
</div>

Original Author: {{ page.author }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to start a summary of the document, stating its scope, goal and structure.

Robotic systems need to be responsive.
In mission critical applications, a delay of less than a millisecond in the system can cause a catastrophic failure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this phrase below, to the definition of real-time. See comments on hard, firm, soft real-time.

For ROS 2.0 to capture the needs of the robotics community, the core software components must not interfere with the requirements of real-time computing.

# Definition of Real-time Computing

Real-time software guarantees correct computation at the correct time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe introduce key terms in the definition, like determinism and deadline. Something like:

real-time software meets computational deadlines ina deterministic way. The correctness depends not only on the result of a computation, but also on the time it was delivered. Thus, failure to respond is as bad as the wrong response.

Paraphrased from http://www.cse.unsw.edu.au/~cs9242/08/lectures/09-realtimex2.pdf

Hard real-time software systems have a set of strict deadlines, and missing a deadline is considered a failure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this answer explains hard and soft (and firm, not so used) real-time. It's simple and clear. Can we tolerate missing a deadline?, are results useful after a missed deadline?.

Here would be a good place to present example of systems which are hard or soft real-time.

Soft real-time systems degrade their quality of service if a deadline is missed.

Real-time computer systems are often associated with low-latency systems.
Many applications of real-time computing are also low-latency applications (for example, automated piloting systems must be reactive to sudden changes in the environment).
However, it is generally agreed upon that a real-time system is not defined by low latency, but by a deterministic schedule: it must be guaranteed that the system finishes a certain task by a certain time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an alternative to 'generally agreed'. It gives the impression that this is a subjective statement, based on the opinion of a majority.

Therefore, it is important that the latency in the system be measurable and a maximum allowable latency for tasks be set.

A more useful metric in evaluating the "hardness" of a real-time system is jitter, which is defined as the variable deviation between a task's deadline and its actual time of completion.
A more jittery system is less deterministic, less predictable, and less real-time.
Though in practice it is impossible to completely eliminate jitter from a real-time system, it is a worthy goal to determine a hard upper bound for jitter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the next section, it could be mentioned that a system that meets real-time requirements needs both an operating system (kernel, scheduler) and user code capable of delivering deterministic execution.

It might be out of scope, but a comment on RTOS alternatives could be interesting. It would be the complement to the next section on writing real-time safe user code. In the Linux world, it's worth mentioning:

  • Linux kernel patches like PREEMPT_RT which '...allows nearly all of the kernel to be preempted'.
  • Hard real-time co-kernels (also called hypervisors) that cooperate with the Linux kernel like Xenomai (POSIX compliant) and RTAI. They treat Linux as the idle task of the real-time scheduler.

There are also RTOS for mission-critical and safety-critical systems, like QNX Neutrino (fully POSIX compliant) and VxWorks (POSIX compliant through compatibility layer) that are certified to IEC 61508 SIL 3.

For all the above RTOS except Linux+PREEMPT_RT, one must generate binaries that are not compatible with standard Linux, which might be an inconvenience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it would be great to target ROS 2 to perform in real-time on a variety of RTOS's/real-time environments and set up several machines for performance testing. I have been testing with a machine with the PREEMPT_RTkernel installed, hence the focus on that platform in this document.

# Implementation of Real-time Computing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about rewording the title to something like 'Writing real-time safe code'. From reading the above title, I'm not sure if it refers to writing an RTOS or user code.


In general, an operating system can guarantee that the tasks it handles for the developer, such as thread scheduling, are deterministic, but the OS may not guarantee that the developer's code will run in real-time.
Therefore, it is up to the developer to know what the determinstic guarantees of an existing system are, and what she must do to write hard real-time code on top of the OS.

In this section, various strategies for developing on top of a real-time OS are explored, since these strategies might be applicable to ROS 2.
The patterns focus on the use case of C/C++ development on Linux-based real-time OS's (such as RTLinux), but the general concepts are applicable to other platforms.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure, by RTLinux are you referring to the microkernel that was maintained by WindRiver until 2011, or the PREEMPT_RT patch?.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PREEMPT_RT (RT_PREEMPT?) patch. We are using a version maintained by IHMC that is available in this PPA. I can clarify this in the text.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RT_PREEMPT is correct, my bad.


## Memory management

### Lock memory, prefault stack:

```c
if (mlockall(MCL_CURRENT|MCL_FUTURE) == -1) {
perror("mlockall failed");
exit(-2);
}
unsigned char dummy[MAX_SAFE_STACK];

memset(dummy, 0, MAX_SAFE_STACK);
```

`mlockall` is a Linux system call for locking the process's virtual address space into RAM, preventing the memory that will be accessed by the process from getting paged into swap space.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlockall hyperlink or [ref], for convenient jumping to the source?.

Paging and accessing paged memory is a nondeterministic operation that should be avoided in real-time computation.

This code snippet, when run at the beginning of a thread's lifecycle, ensures that no pagefaults occur while the thread is running.
`mlockall` locks the stack for the thread.
The `memset` call pre-loads each block of memory of the stack into the cache, so that no pagefaults will occur when the stack is accessed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### Allocate dynamic memory pool

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before going into details of the solutions, mention what is the problem with dynamic memory allocation. This Wikipedia entry provides a good summary. It mentions issues with memory fragmentation, and the non-deterministic nature of standard memory allocators.

Generally speaking, I would recommend strategies that depend on the problem at hand:

  • Data size is known at complile-time: Use fixed objects allocated on the stack (if reasonably sized).
  • Data size is known at runtime, during configuration, before starting RT-scheduled execution. Pre-allocate resources on the heap. Node lifecycle becomes relevant here, as we can clearly state in which transitions a real-time safe node can perform non real-time safe calls.
  • Data size is known at (or can change during) RT-scheduled execution. Multiple alternatives exist, which is mostly what you focus on below. Typical strategies are to use object pools (good if you have some context of the problem), general-purpose memory pools like the TLSF O(1) allocator, stack allocation, etc. In such cases there's typically a buffer that can wither get full or overflow, so AFAIK, there's no silver bullet.


```c
if (mlockall(MCL_CURRENT | MCL_FUTURE))
perror("mlockall failed:");

/* Turn off malloc trimming.*/
mallopt(M_TRIM_THRESHOLD, -1);

/* Turn off mmap usage. */
mallopt(M_MMAP_MAX, 0);

page_size = sysconf(_SC_PAGESIZE);
buffer = malloc(SOMESIZE);

for (i=0; i < SOMESIZE; i+=page_size)
{
buffer[i] = 0;
}
free(buffer);
```

It is commonly believed that dynamic memory allocations are not permitted in the real-time code path.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reword 'commonly believed', as it again gives the impression that this is an opinionated statement. If the default allocator is non-deterministic or subject to causing page faults, then it cannot be used in the real-time path. Guidelines for critical system code like MISRA C (paywalled) go as far as recommending not using malloc, ever (!!!), because it's too implementation dependent, and very little guarantees can be made on its behavior.

This code snippet shows how to lock the virtual address space, disallow returning deallocated memory to the kernel via `sbrk`, and disable `mmap`.
It effectively locks a pool of memory in the heap into RAM, which prevents page faults due to `malloc` and `free`.

Pros:

* Can use malloc/new, free/delete, and even STL containers

Cons:

* Must accurately predict bounded memory size for the process!
* Using STL containers is therefore dangerous (unbounded sizes)
* In practice, only works for small processes

### Custom fixed allocators for STL containers

An alternative to the above approach is to implement custom allocators for STL containers that only allocate memory on the stack.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom allocators don't necessarily need to allocate on the stack, although that is one solution to the problem.


There are various comprehensive tutorials [already written](http://www.codeguru.com/cpp/article.php/c18503/C-Programming-Stack-Allocators-for-STL-Containers.htm) for this task.

Pros:
* Use existing STL code with deterministic computation and without allocating global dynamic memory pool
* More modular solution

Cons:
* Complex to implement

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one leverages an existing allocator, the cons end up being an additional dependency and potentially some extra complexity in setting up the allocator. The latter is allocator-specific, but typical setup parameters are pool size, or specifying the policy of what to do if pool gets full (allocate more vs. std::bad_alloc). More verbose syntax (e.g. specifying allocator in STL container template params) can be alleviated with typedefs.


### Global variables and (static) arrays

Global variables are preallocated at the start of a process, thus assigning and accessing them is real-time safe.
However, this strategy comes with the many disadvantages of using global variables.

### Use inheritance sparingly

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subsection talks first about inheritance, then about PIMPL. The common subject is cache-friendliness. Maybe the title should be changed accordingly for consistency.


Classes with many levels of inheritance may not be real-time safe because of vtable overhead access.
When executing an inherited function, the program needs to access the data used in the function, the vtable for the class, and the instructions for the function, which are all stored in different parts of memory, and may or may not be stored in cache together.

In general, C++ patterns with poor cache locality are not well-suited to real-time environments.
Another such pattern is the opaque pointer idiom (PIMPL), which is convenient for ABI compatibility and speeding up compile times.
However, bouncing between the memory location for the object and its private data pointer causes the cache to "spill" as it loads one chunk of memory and then another, unrelated chunk for almost every function in the PIMPLized object.

### Exceptions

Throwing an exception can put large objects on the stack, which is undesirable in real-time programming since we cannot allocate memory on the heap.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get this statement (stack vs. heap). My understanding is that when no exceptions are thrown, the overhead is small, but exception handling (stack unwinding, catching) can add a significant penalty. FWIW, this stack overflow answer states something similar.

On modern C++ compilers, catching an exception has little memory or time overhead, but can lead to unexpected code growth.

## Device I/O
Interacting with physical devices (disk I/O, printing to the screen, etc.) may introduce unacceptable latency in the real-time code path, since the process is often forced to wait on slow physical phenomena.
Additionally, many I/O calls such as `fopen` result in pagefaults.

Keep disk reads/writes at the beginning or end of the program, outside of the RT code path.

Spin up userspace threads (executing on a different CPU from the RT code) to print output to the screen.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be enough to perform the printing from a non-rt scheduled thread. No actual need for CPU isolation.


## Multithreaded Programming and Synchronization

Real-time computation requirements change the typical paradigm of multithreaded programming.
Program execution may not block asynchronously, and threads must be scheduled deterministically.
A real-time operating system will fulfill this scheduling requirement, but there are still pitfalls for the developer to fall into.
This section provides guidelines for avoiding these pitfalls.

### Thread creation guidelines

Create threads at the start of the program.
This confines the nondeterministic overhead of thread allocation to a defined point in the process.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note based on past experience. One must avoid any calls that can lead to fork() from a process running real-time scheduled threads. Some time ago we had this witch hunt in which a non real-time thread was loading pluginlib plugins and wrecking havoc on the real-time control thread. It turns out that fork() is lazy and does copy-on-write of memory pages. A workaround was applied here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting story and nice solution! I will add an item here about avoiding fork and anything with fork in its call stack.


Create high priority (but not 99) threads with a FIFO or Round Robin scheduler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or SCHED_DEADLINE, since version 3.14. Source.


### Avoid priority inversion
Priority inversion can occur on a system with a preemptive task scheduler and results in deadlock.
It occurs when: a low-priority task acquires a lock and is then pre-empted by a medium-priority task, then a high-priority task acquires the lock held by the low-priority task.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording is a bit confusing, in particular, the high-prio task does not acquire the lock (but attempts and fails). This example is longer, but the wording is clearer.


The three tasks are stuck in a triangle: the high-priority task is blocked on the low-priority task, which is blocked on the medium-priority task because it was preempted by a task with a higher priority, and the medium-priority task is also blocked on a task with a higher priority.

Here are the some solutions to priority inversion:

* Don't use locks (but sometimes, they are necessary)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant as don't use any kind of synchronization primitives?.

* Disable preemption for tasks holding locks (can lead to jitter)
* Increase priority of task holding a lock
* Use priority inheritance: a task that owns a lock inherits the priority of a task that tries to acquire the lock

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative: Use lock-free data structures. They don't have blocking semantics and guarantee forward progress of the active thread. On the flipside, their performance is typically poorer than locks, and can have a larger memory footprint (because of multiple versioned objects being tracked). A very nice read on the subject.

### Timing shots

One real-time synchronization technique is when a thread calculates its next "shot" (the start of its next execution period).
For example, if a thread is required to provide an update every 10 milliseconds, and it must complete an operation that takes 3-6 milliseconds, the thread should get the time before the operation, do the operation, and then wait for the remaining 7-4 milliseconds, based on the time measured after the operation.

The most important consideration for the developer is to use a high precision timer, such as `nanosleep` on Linux platforms, while waiting.
Otherwise the system will experience drift.

### Spinlocks
Spinlocks tend to cause clock drift.
The developer should avoid implementing his own spinlocks.
The RT Preempt patch replaces much of the kernel's spinlocks with mutexes, but this might not be guaranteed on all platforms.

# Testing and Performance Benchmarking

## cyclictest

`cyclictest` is a simple Linux command line tool for measuring the jitter of a real-time environment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes as input a number of threads, a priority for the threads, and a scheduler type.
It spins up `n` threads that sleep regular intervals (the sleep period can also be specified from the command line).

For each thread, `cyclictest` measures the time between when the thread is supposed to wake up and when it actually wakes up.
This statistic is the scheduling jitter in the system.
If there are processes with non-deterministic blocking behavior running in the system, the jitter will grow to a large number (on the order of milliseconds), since the scheduler cannot meet the deadlines of the periodically sleeping threads profiled in the program.
An ideal real-time system will have an average scheduling jitter on the order of nanoseconds or tens of nanoseconds.

## Test Pipeline

Instrumenting real-time code to validate its correctness may be tricky because the time overhead due to instrumentation.
A simple benchmarking test that does not involve code instrumentation is as follows:

On an RTLinux machine, start script that runs the process to be tested for some fixed time interval.
Collect data with cyclictest to record the jitter in the system.
A variety of command line tools could be used to add extra stress to the system in a controlled way, such as [cache calibrator](http://homepages.cwi.nl/~manegold/Calibrator/) to pollute the cache in an attempt to create pagefaults, or `fping`, which generates a large number of interrupts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RTOS we have more experience with is Xenomai. It comes with its own testing scripts.

There's also the stress package for subjecting a system to different kinds of load.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, using Xenomai, it's possible to detect overruns (SIGXCPU signal) and relaxes (SIGDEBUG signal), which can prove useful for automated performance testing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although setting up a Xenomai system is more complex than an RT_PREEMPT one, and you need to generate different binaries (that link against Xenomai libs) there's a clear separation between primary and secondary modes, and it's reasonably easy to detect non real-time safe code. With PREEMPT_RT, one can leverage all the typical Linux tooling, which is great, but I've found it more challenging to detect real-time violations. My experience with PREEMPT_RT is limited, so the problem might be on my end.

Tools for adding load to the system could also be used to test QoS settings in DDS and how that affects real-time performance.

This procedure could be run on a suite of example cases: a simple benchmark program known to run in real time, inter-process DDS communication, intra-process DDS communication, inter-process ROS 2 communication, intra-process ROS 2 communication, another benchmark program that obviously does not run in real time (lots of disk I/O, dynamic memory allocation, and blocking).

TODO: Latency test

TODO: Test schemes for different embedded platforms

# Design Guidelines for ROS 2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No brain left for this part. Generally speaking, the typical things that potentially need special codepaths to be compatible with an RTOS are:

  • Synchonization primitives
  • Threads
  • Clock access, especially getting (Linux) system timestamps.

Also, an RT-friendly string class is invaluable for logging purposes.


## Achieving real-time computation across platforms

TODO: How can ROS 2 be real-time friendly and cross-platform?
Much of the research in this document focuses on achieving real-time on Linux systems with pthreads.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pthreads → POSIX


## Implementation strategy

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth mentioning explicitly. Using ROS interfaces should be real-time safe (at least within a limited context, like intra-process), setting them up needs not be. In particular, for pub-sub with dynamic-sized data, it should be possible to pre-allocate resources (e.g., resize payload arrays) at setup time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eb93b9a (and other comments)


There are a few possible strategies for the real-time "hardening" of existing and future ROS 2 code:

* Create a configuration option for the stack to operate in "real-time friendly" mode.
* Pros:
* Allows user to dynamicaly switch between real-time and non-real-time modes.
* Cons:
* Refactoring overhead. Integrating real-time code with existing code may be intractable.
* Total code size may be impractical for embedded systems.

* Implement a new real-time stack (rclrt, rmwrt, etc.) designed with real-time computing in mind.
* Pros:
* Easier to design and maintain.
* Real-time code is "quarantined" from existing code. Can fully optimize library for real-time application.
* Cons:
* More packages to write and maintain.
* Potentially less convenient for the user.

* Give the option for real-time safety up to a certain point in the stack, and implement a real-time safe language wrapper (rclrt or rclc)
* Pros:
* Existing code is designed for this refactoring to be fairly easy
* User can provide memory allocation strategy to rcl/rmw to ensure deterministic operation
* Synchronization happens at the top the language/OS-specific layer, so refactoring rcl/rmw is easier
* May be easier to support multiple embedded platforms
* Cons:
* Refactoring overhead
* More flexibility for user may mean more complexity


# Sources

* [Real-Time Linux Wiki](https://rt.wiki.kernel.org/)

* Scott Salmon, [How to make C++ more real-time friendly](http://www.embedded.com/design/programming-languages-and-tools/4429790/2/How-to-make-C--more-real-time-friendly)

* Stack Overflow, [Are Exceptions still undesirable in Realtime environment?](http://stackoverflow.com/questions/5257190/are-exceptions-still-undesirable-in-realtime-environment)

* Pavel Moryc, [Task jitter measurement under RTLinux operating system](https://fedcsis.org/proceedings/2007/pliks/48.pdf)