-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added article about ROS 2.0 in realtime #35
Changes from 6 commits
248e5fd
63c064c
c16e8b0
fd9a79c
7d77b05
67b1a16
0990fa3
2a13eea
eb93b9a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,259 @@ | ||
--- | ||
layout: default | ||
title: ROS 2.0 in Realtime | ||
permalink: articles/realtime.html | ||
abstract: This article describes real-time computing requirements and how ROS 2 will fulfill them. | ||
published: false | ||
author: Jackie Kay | ||
--- | ||
|
||
* This will become a table of contents (this text will be scraped). | ||
{:toc} | ||
|
||
# {{ page.title }} | ||
|
||
<div class="abstract" markdown="1"> | ||
{{ page.abstract }} | ||
</div> | ||
|
||
Original Author: {{ page.author }} | ||
|
||
Robotic systems need to be responsive. | ||
In mission critical applications, a delay of less than a millisecond in the system can cause a catastrophic failure. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Move this phrase below, to the definition of real-time. See comments on hard, firm, soft real-time. |
||
For ROS 2.0 to capture the needs of the robotics community, the core software components must not interfere with the requirements of real-time computing. | ||
|
||
# Definition of Real-time Computing | ||
|
||
Real-time software guarantees correct computation at the correct time. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe introduce key terms in the definition, like determinism and deadline. Something like: real-time software meets computational deadlines ina deterministic way. The correctness depends not only on the result of a computation, but also on the time it was delivered. Thus, failure to respond is as bad as the wrong response. Paraphrased from http://www.cse.unsw.edu.au/~cs9242/08/lectures/09-realtimex2.pdf |
||
Hard real-time software systems have a set of strict deadlines, and missing a deadline is considered a failure. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like how this answer explains hard and soft (and firm, not so used) real-time. It's simple and clear. Can we tolerate missing a deadline?, are results useful after a missed deadline?. Here would be a good place to present example of systems which are hard or soft real-time. |
||
Soft real-time systems degrade their quality of service if a deadline is missed. | ||
|
||
Real-time computer systems are often associated with low-latency systems. | ||
Many applications of real-time computing are also low-latency applications (for example, automated piloting systems must be reactive to sudden changes in the environment). | ||
However, it is generally agreed upon that a real-time system is not defined by low latency, but by a deterministic schedule: it must be guaranteed that the system finishes a certain task by a certain time. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use an alternative to 'generally agreed'. It gives the impression that this is a subjective statement, based on the opinion of a majority. |
||
Therefore, it is important that the latency in the system be measurable and a maximum allowable latency for tasks be set. | ||
|
||
A more useful metric in evaluating the "hardness" of a real-time system is jitter, which is defined as the variable deviation between a task's deadline and its actual time of completion. | ||
A more jittery system is less deterministic, less predictable, and less real-time. | ||
Though in practice it is impossible to completely eliminate jitter from a real-time system, it is a worthy goal to determine a hard upper bound for jitter. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before the next section, it could be mentioned that a system that meets real-time requirements needs both an operating system (kernel, scheduler) and user code capable of delivering deterministic execution. It might be out of scope, but a comment on RTOS alternatives could be interesting. It would be the complement to the next section on writing real-time safe user code. In the Linux world, it's worth mentioning:
There are also RTOS for mission-critical and safety-critical systems, like QNX Neutrino (fully POSIX compliant) and VxWorks (POSIX compliant through compatibility layer) that are certified to IEC 61508 SIL 3. For all the above RTOS except Linux+PREEMPT_RT, one must generate binaries that are not compatible with standard Linux, which might be an inconvenience. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, it would be great to target ROS 2 to perform in real-time on a variety of RTOS's/real-time environments and set up several machines for performance testing. I have been testing with a machine with the |
||
# Implementation of Real-time Computing | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you think about rewording the title to something like 'Writing real-time safe code'. From reading the above title, I'm not sure if it refers to writing an RTOS or user code. |
||
|
||
In general, an operating system can guarantee that the tasks it handles for the developer, such as thread scheduling, are deterministic, but the OS may not guarantee that the developer's code will run in real-time. | ||
Therefore, it is up to the developer to know what the determinstic guarantees of an existing system are, and what she must do to write hard real-time code on top of the OS. | ||
|
||
In this section, various strategies for developing on top of a real-time OS are explored, since these strategies might be applicable to ROS 2. | ||
The patterns focus on the use case of C/C++ development on Linux-based real-time OS's (such as RTLinux), but the general concepts are applicable to other platforms. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to be sure, by RTLinux are you referring to the microkernel that was maintained by WindRiver until 2011, or the PREEMPT_RT patch?. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. RT_PREEMPT is correct, my bad. |
||
|
||
## Memory management | ||
|
||
### Lock memory, prefault stack: | ||
|
||
```c | ||
if (mlockall(MCL_CURRENT|MCL_FUTURE) == -1) { | ||
perror("mlockall failed"); | ||
exit(-2); | ||
} | ||
unsigned char dummy[MAX_SAFE_STACK]; | ||
|
||
memset(dummy, 0, MAX_SAFE_STACK); | ||
``` | ||
|
||
`mlockall` is a Linux system call for locking the process's virtual address space into RAM, preventing the memory that will be accessed by the process from getting paged into swap space. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
Paging and accessing paged memory is a nondeterministic operation that should be avoided in real-time computation. | ||
|
||
This code snippet, when run at the beginning of a thread's lifecycle, ensures that no pagefaults occur while the thread is running. | ||
`mlockall` locks the stack for the thread. | ||
The `memset` call pre-loads each block of memory of the stack into the cache, so that no pagefaults will occur when the stack is accessed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
### Allocate dynamic memory pool | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before going into details of the solutions, mention what is the problem with dynamic memory allocation. This Wikipedia entry provides a good summary. It mentions issues with memory fragmentation, and the non-deterministic nature of standard memory allocators. Generally speaking, I would recommend strategies that depend on the problem at hand:
|
||
|
||
```c | ||
if (mlockall(MCL_CURRENT | MCL_FUTURE)) | ||
perror("mlockall failed:"); | ||
|
||
/* Turn off malloc trimming.*/ | ||
mallopt(M_TRIM_THRESHOLD, -1); | ||
|
||
/* Turn off mmap usage. */ | ||
mallopt(M_MMAP_MAX, 0); | ||
|
||
page_size = sysconf(_SC_PAGESIZE); | ||
buffer = malloc(SOMESIZE); | ||
|
||
for (i=0; i < SOMESIZE; i+=page_size) | ||
{ | ||
buffer[i] = 0; | ||
} | ||
free(buffer); | ||
``` | ||
|
||
It is commonly believed that dynamic memory allocations are not permitted in the real-time code path. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reword 'commonly believed', as it again gives the impression that this is an opinionated statement. If the default allocator is non-deterministic or subject to causing page faults, then it cannot be used in the real-time path. Guidelines for critical system code like MISRA C (paywalled) go as far as recommending not using |
||
This code snippet shows how to lock the virtual address space, disallow returning deallocated memory to the kernel via `sbrk`, and disable `mmap`. | ||
It effectively locks a pool of memory in the heap into RAM, which prevents page faults due to `malloc` and `free`. | ||
|
||
Pros: | ||
|
||
* Can use malloc/new, free/delete, and even STL containers | ||
|
||
Cons: | ||
|
||
* Must accurately predict bounded memory size for the process! | ||
* Using STL containers is therefore dangerous (unbounded sizes) | ||
* In practice, only works for small processes | ||
|
||
### Custom fixed allocators for STL containers | ||
|
||
An alternative to the above approach is to implement custom allocators for STL containers that only allocate memory on the stack. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Custom allocators don't necessarily need to allocate on the stack, although that is one solution to the problem. |
||
|
||
There are various comprehensive tutorials [already written](http://www.codeguru.com/cpp/article.php/c18503/C-Programming-Stack-Allocators-for-STL-Containers.htm) for this task. | ||
|
||
Pros: | ||
* Use existing STL code with deterministic computation and without allocating global dynamic memory pool | ||
* More modular solution | ||
|
||
Cons: | ||
* Complex to implement | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If one leverages an existing allocator, the cons end up being an additional dependency and potentially some extra complexity in setting up the allocator. The latter is allocator-specific, but typical setup parameters are pool size, or specifying the policy of what to do if pool gets full (allocate more vs. |
||
|
||
### Global variables and (static) arrays | ||
|
||
Global variables are preallocated at the start of a process, thus assigning and accessing them is real-time safe. | ||
However, this strategy comes with the many disadvantages of using global variables. | ||
|
||
### Use inheritance sparingly | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This subsection talks first about inheritance, then about PIMPL. The common subject is cache-friendliness. Maybe the title should be changed accordingly for consistency. |
||
|
||
Classes with many levels of inheritance may not be real-time safe because of vtable overhead access. | ||
When executing an inherited function, the program needs to access the data used in the function, the vtable for the class, and the instructions for the function, which are all stored in different parts of memory, and may or may not be stored in cache together. | ||
|
||
In general, C++ patterns with poor cache locality are not well-suited to real-time environments. | ||
Another such pattern is the opaque pointer idiom (PIMPL), which is convenient for ABI compatibility and speeding up compile times. | ||
However, bouncing between the memory location for the object and its private data pointer causes the cache to "spill" as it loads one chunk of memory and then another, unrelated chunk for almost every function in the PIMPLized object. | ||
|
||
### Exceptions | ||
|
||
Throwing an exception can put large objects on the stack, which is undesirable in real-time programming since we cannot allocate memory on the heap. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really get this statement (stack vs. heap). My understanding is that when no exceptions are thrown, the overhead is small, but exception handling (stack unwinding, catching) can add a significant penalty. FWIW, this stack overflow answer states something similar. |
||
On modern C++ compilers, catching an exception has little memory or time overhead, but can lead to unexpected code growth. | ||
|
||
## Device I/O | ||
Interacting with physical devices (disk I/O, printing to the screen, etc.) may introduce unacceptable latency in the real-time code path, since the process is often forced to wait on slow physical phenomena. | ||
Additionally, many I/O calls such as `fopen` result in pagefaults. | ||
|
||
Keep disk reads/writes at the beginning or end of the program, outside of the RT code path. | ||
|
||
Spin up userspace threads (executing on a different CPU from the RT code) to print output to the screen. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be enough to perform the printing from a non-rt scheduled thread. No actual need for CPU isolation. |
||
|
||
## Multithreaded Programming and Synchronization | ||
|
||
Real-time computation requirements change the typical paradigm of multithreaded programming. | ||
Program execution may not block asynchronously, and threads must be scheduled deterministically. | ||
A real-time operating system will fulfill this scheduling requirement, but there are still pitfalls for the developer to fall into. | ||
This section provides guidelines for avoiding these pitfalls. | ||
|
||
### Thread creation guidelines | ||
|
||
Create threads at the start of the program. | ||
This confines the nondeterministic overhead of thread allocation to a defined point in the process. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A note based on past experience. One must avoid any calls that can lead to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting story and nice solution! I will add an item here about avoiding |
||
|
||
Create high priority (but not 99) threads with a FIFO or Round Robin scheduler. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. or |
||
|
||
### Avoid priority inversion | ||
Priority inversion can occur on a system with a preemptive task scheduler and results in deadlock. | ||
It occurs when: a low-priority task acquires a lock and is then pre-empted by a medium-priority task, then a high-priority task acquires the lock held by the low-priority task. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The wording is a bit confusing, in particular, the high-prio task does not acquire the lock (but attempts and fails). This example is longer, but the wording is clearer. |
||
|
||
The three tasks are stuck in a triangle: the high-priority task is blocked on the low-priority task, which is blocked on the medium-priority task because it was preempted by a task with a higher priority, and the medium-priority task is also blocked on a task with a higher priority. | ||
|
||
Here are the some solutions to priority inversion: | ||
|
||
* Don't use locks (but sometimes, they are necessary) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this meant as don't use any kind of synchronization primitives?. |
||
* Disable preemption for tasks holding locks (can lead to jitter) | ||
* Increase priority of task holding a lock | ||
* Use priority inheritance: a task that owns a lock inherits the priority of a task that tries to acquire the lock | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another alternative: Use lock-free data structures. They don't have blocking semantics and guarantee forward progress of the active thread. On the flipside, their performance is typically poorer than locks, and can have a larger memory footprint (because of multiple versioned objects being tracked). A very nice read on the subject. |
||
### Timing shots | ||
|
||
One real-time synchronization technique is when a thread calculates its next "shot" (the start of its next execution period). | ||
For example, if a thread is required to provide an update every 10 milliseconds, and it must complete an operation that takes 3-6 milliseconds, the thread should get the time before the operation, do the operation, and then wait for the remaining 7-4 milliseconds, based on the time measured after the operation. | ||
|
||
The most important consideration for the developer is to use a high precision timer, such as `nanosleep` on Linux platforms, while waiting. | ||
Otherwise the system will experience drift. | ||
|
||
### Spinlocks | ||
Spinlocks tend to cause clock drift. | ||
The developer should avoid implementing his own spinlocks. | ||
The RT Preempt patch replaces much of the kernel's spinlocks with mutexes, but this might not be guaranteed on all platforms. | ||
|
||
# Testing and Performance Benchmarking | ||
|
||
## cyclictest | ||
|
||
`cyclictest` is a simple Linux command line tool for measuring the jitter of a real-time environment. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
It takes as input a number of threads, a priority for the threads, and a scheduler type. | ||
It spins up `n` threads that sleep regular intervals (the sleep period can also be specified from the command line). | ||
|
||
For each thread, `cyclictest` measures the time between when the thread is supposed to wake up and when it actually wakes up. | ||
This statistic is the scheduling jitter in the system. | ||
If there are processes with non-deterministic blocking behavior running in the system, the jitter will grow to a large number (on the order of milliseconds), since the scheduler cannot meet the deadlines of the periodically sleeping threads profiled in the program. | ||
An ideal real-time system will have an average scheduling jitter on the order of nanoseconds or tens of nanoseconds. | ||
|
||
## Test Pipeline | ||
|
||
Instrumenting real-time code to validate its correctness may be tricky because the time overhead due to instrumentation. | ||
A simple benchmarking test that does not involve code instrumentation is as follows: | ||
|
||
On an RTLinux machine, start script that runs the process to be tested for some fixed time interval. | ||
Collect data with cyclictest to record the jitter in the system. | ||
A variety of command line tools could be used to add extra stress to the system in a controlled way, such as [cache calibrator](http://homepages.cwi.nl/~manegold/Calibrator/) to pollute the cache in an attempt to create pagefaults, or `fping`, which generates a large number of interrupts. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The RTOS we have more experience with is Xenomai. It comes with its own testing scripts. There's also the stress package for subjecting a system to different kinds of load. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, using Xenomai, it's possible to detect overruns (SIGXCPU signal) and relaxes (SIGDEBUG signal), which can prove useful for automated performance testing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although setting up a Xenomai system is more complex than an RT_PREEMPT one, and you need to generate different binaries (that link against Xenomai libs) there's a clear separation between primary and secondary modes, and it's reasonably easy to detect non real-time safe code. With PREEMPT_RT, one can leverage all the typical Linux tooling, which is great, but I've found it more challenging to detect real-time violations. My experience with PREEMPT_RT is limited, so the problem might be on my end. |
||
Tools for adding load to the system could also be used to test QoS settings in DDS and how that affects real-time performance. | ||
|
||
This procedure could be run on a suite of example cases: a simple benchmark program known to run in real time, inter-process DDS communication, intra-process DDS communication, inter-process ROS 2 communication, intra-process ROS 2 communication, another benchmark program that obviously does not run in real time (lots of disk I/O, dynamic memory allocation, and blocking). | ||
|
||
TODO: Latency test | ||
|
||
TODO: Test schemes for different embedded platforms | ||
|
||
# Design Guidelines for ROS 2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No brain left for this part. Generally speaking, the typical things that potentially need special codepaths to be compatible with an RTOS are:
Also, an RT-friendly string class is invaluable for logging purposes. |
||
|
||
## Achieving real-time computation across platforms | ||
|
||
TODO: How can ROS 2 be real-time friendly and cross-platform? | ||
Much of the research in this document focuses on achieving real-time on Linux systems with pthreads. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pthreads → POSIX |
||
|
||
## Implementation strategy | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe worth mentioning explicitly. Using ROS interfaces should be real-time safe (at least within a limited context, like intra-process), setting them up needs not be. In particular, for pub-sub with dynamic-sized data, it should be possible to pre-allocate resources (e.g., resize payload arrays) at setup time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. eb93b9a (and other comments) |
||
|
||
There are a few possible strategies for the real-time "hardening" of existing and future ROS 2 code: | ||
|
||
* Create a configuration option for the stack to operate in "real-time friendly" mode. | ||
* Pros: | ||
* Allows user to dynamicaly switch between real-time and non-real-time modes. | ||
* Cons: | ||
* Refactoring overhead. Integrating real-time code with existing code may be intractable. | ||
* Total code size may be impractical for embedded systems. | ||
|
||
* Implement a new real-time stack (rclrt, rmwrt, etc.) designed with real-time computing in mind. | ||
* Pros: | ||
* Easier to design and maintain. | ||
* Real-time code is "quarantined" from existing code. Can fully optimize library for real-time application. | ||
* Cons: | ||
* More packages to write and maintain. | ||
* Potentially less convenient for the user. | ||
|
||
* Give the option for real-time safety up to a certain point in the stack, and implement a real-time safe language wrapper (rclrt or rclc) | ||
* Pros: | ||
* Existing code is designed for this refactoring to be fairly easy | ||
* User can provide memory allocation strategy to rcl/rmw to ensure deterministic operation | ||
* Synchronization happens at the top the language/OS-specific layer, so refactoring rcl/rmw is easier | ||
* May be easier to support multiple embedded platforms | ||
* Cons: | ||
* Refactoring overhead | ||
* More flexibility for user may mean more complexity | ||
|
||
|
||
# Sources | ||
|
||
* [Real-Time Linux Wiki](https://rt.wiki.kernel.org/) | ||
|
||
* Scott Salmon, [How to make C++ more real-time friendly](http://www.embedded.com/design/programming-languages-and-tools/4429790/2/How-to-make-C--more-real-time-friendly) | ||
|
||
* Stack Overflow, [Are Exceptions still undesirable in Realtime environment?](http://stackoverflow.com/questions/5257190/are-exceptions-still-undesirable-in-realtime-environment) | ||
|
||
* Pavel Moryc, [Task jitter measurement under RTLinux operating system](https://fedcsis.org/proceedings/2007/pliks/48.pdf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to start a summary of the document, stating its scope, goal and structure.