Skip to content

Data Structures

Martin Thompson edited this page Sep 22, 2017 · 16 revisions

To support the transport of messages from senders to receivers a number of data structures are required. These data structures need to conform to a number of design principles for the common transport path, exceptional cases can be afforded an exceptional path, e.g. large messages:

  1. Copy-Free: The send and receive buffer is directly used.
  2. Allocation-Free: The send and receive buffer is pre-allocated so that the transport path is free from the allocation or reclamation of memory.
  3. Lock-Free: Lock-free techniques are employed to manage concurrent access by multiple producers.
  4. Persistent/Immutable: Message buffers for send and receive record immutable history to allow replay and resend semantics.
  5. O(1) Cost: All operations are constant time regardless of buffer length, senders, receivers, contention, etc.

Publication Log Buffer

The sender needs to be able to support the efficient retransmission of messages. Messages may need to be retransmitted to cope with loss or late joiners to a communications stream. Each log partition represents a term in the communication history from the source.

 Log Buffer - Memory Mapped File
+----------------------------+
|           Term 0           |
+----------------------------+
|           Term 1           |
+----------------------------+
|           Term 2           |
+----------------------------+
|        Log Meta Data       |
+----------------------------+

Messages are stored to the buffer in preparation for transmission as a sequential stream with FIFO semantics. The next n bytes of the buffer are claimed by performing an atomic increment of the tail counter stored in the state trailer. The algorithm is designed to take advantage of LOCK XADD on x86 to avoid spinning CAS (LOCK CMPXCHG) loops.

The publishers of message call offer(bytes) on a publication object passing the bytes to be sent. The send call then follows the following algorithm:

  1. IF message is bigger than transmission frame length THEN
    1. Send in chunks
  2. END IF
  3. WHILE next capacity claim < message length
    1. Mark claimed capacity with padding
    2. Move on to next term
  4. END WHILE
  5. Copy in message
  6. Mark record in log as complete

A sender thread observes the publication and repeatedly performs the following algorithm:

  1. Scan forward for new frames
  2. Send message on underlying network layer

Log Metadata

The log metadata describes the current state of the log buffer for the communications term. Variables are stored with consideration for false sharing issues when concurrent access is required.

  • active index: index of the current active term.
  • tail counter: buffer offset at which the next message can be added. An atomic get-and-increment operation is used to advance the tail for claiming capacity in the buffer.

Publication Image Log Buffer

The Image log buffer is a mirror of the publication log buffer with reciprocal semantics for assembling a sequence of messages from a sender over an unreliable network layer. The receiver is single threaded and thus does not need to maintain the state trailer therefore those fields are all zero.

A receiver thread receives messages from the network layer into the publication image.