Programming Strategies for Low-Latency Applications

Low-latency programming is poorly documented partly due to its value within industry; high-frequency trading is one of many professional fields where firms are striving to achieve latency reductions in the magnitude of nanoseconds. There are fragments of information available about what programmers can do to make their programs faster, but the strategies that give trading firms the competitive edge in the market are closely guarded secrets. A difference of nanoseconds can determine the success of an application, whether that be in high-frequency trading, networking, or other services where low latencies and fast execution times are critical.

The purpose of this repository is to centralise the fundamentals that a programmer should be familiar with to begin writing and designing applications geared towards achieving low-latency and rapid execution times. This repository includes content tuned for writing low-latency applications and the considerations that should be made in their development:

Documentation of important concepts that are critical for high-performance applications, e.g. computer architecture, systems design.
Demonstrations of techniques designed to increase the performance of applications. Reproducible benchmarks are provided (see the index section below), with instructions on how to run them.

Index

The following table is a directory of the strategies included and their corresponding microbenchmarks (if appropriate). More detail about the benchmarking results can be found in the Documentation, also provided in the table.

Optimisation	Documentation	Benchmarks
Inlining	Link	Link
Loop Unrolling	Link	Link
Predication	Link	Link
Prefetching	Link	Link
SIMD Instructions	Link	Link
Branch Prediction	Link	Link
Resource Contention	Link	Link
Kernel Bypass	Link	N/A
Cache Warming	Link	N/A

The repository is gradually being expanded - if there are any mistakes within the documentation, or there are any interesting optimisations that haven't been included, please feel free to get in touch!

Benchmarks

Benchmarks were collected using Imperial College London Department of Computing's undergraduate laboratory machines, with an Intel Core i7-8700 (3.20GHz) and 16GB of RAM, using g++ 9.4.0. Most of the benchmarks were run without optimisations enabled (-O0), particularly because some optimisations may already be performed at certain optimisation levels.

Requirements

CMake, Version 3.16+
GNU's g++ compiler

Running Benchmarks Locally

Follow these instructions to run the benchmarks in the examples directory:

// Navigate to the directory of the optimisation
cd examples/{OPTIMISATION}

// Create a build directory
mkdir build
cd build

// Create the benchmark executable
cmake -DCMAKE_BUILD_TYPE=Release
make

// Run the benchmark
./{OPTIMISATION}Benchmarks

Reimplementing the Java Modular Packet Processor Library

The Java Modular Packet Processor (JMPP) library is a lightweight Java library for processing network packets with a user-defined graph of individual processing operations. The JMPP library is not intended for production-level applications, but to be a supplementary educational resource that demonstrates the quantitative advantages of using appropriate data structures and reducing resource contention, as outlined in this repository's documentation.

Link: The JMPP Repository

This repository contains a reimplementation of the library, with an additional variant using queues. These are benchmarked, and results are published in the JMPP repository's README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Programming Strategies for Low-Latency Applications

Index

Benchmarks

Requirements

Running Benchmarks Locally

Reimplementing the Java Modular Packet Processor Library

Files

README.md

Latest commit

History

README.md

File metadata and controls

Programming Strategies for Low-Latency Applications

Index

Benchmarks

Requirements

Running Benchmarks Locally

Reimplementing the Java Modular Packet Processor Library