Skip to content

Aeron Archive

Martin Thompson edited this page Oct 12, 2019 · 34 revisions

The Aeron Archive service can record and replay messages streams from durable/persistent storage. The service is designed to be high performance and tests show that its major limitation is the performance of the storage for the recordings. With sufficiently fast storage the archive can record or replay a stream of message at full 10GigE line rate.

Samples can be found here and systems tests here.

Archive Service

To start the Archive service you launch the ArchivingMediaDriver which includes the media driver by composition. This can run standalone or be embedded in another service. The configuration options can be found in the Archive.

The service offers these main features:

  • Record: service can record a particular subscription, described by <channel, streamId>. Each resulting image for the subscription will be recorded under a new recordingId. Local network publications are recorded using the spy feature for efficiency. If no subscribers are active then the recording can advance the stream by setting the aeron.spies.simulate.connection system property to true.

  • Extend: service can extend an existing recording by appending.

  • Replay: service can replay a recorded recordingId from a particular position, and for a particular length which can be Aeron.NULL_VALUE for an open ended replay.

  • Query: the catalog for existing recordings and the recorded position of an active recording.

  • Truncate: allows a stopped recording to have its length truncated, and if truncated to the start position then it is effectively deleted.

  • Replay Merge: allows a late joining subscriber of a recorded stream to replay a recording and then merge with the live stream for cut over if the consumer is fast enough to keep up.

  • Replicate: allows for the replication of a recording from one archive to another, plus have the option to merge with a live multicast stream and record it after using the source archive to catchup. When using replication it is necessary to configure the replication channel for the destination archive with aeron.archive.replication.channel.

The Archive service can be run in one of three threading modes:

  • ArchiveThreadingMode.DEDICATED: 3 threads are used. One for each of the conductor responding to control signals and queries, one for recording streams, and one for replaying streams.
  • ArchiveThreadingMode.SHARED: 1 thread is used for all the control, recording, and replay.
  • ArchiveThreadingMode.INVOKER: No threads are used in the archive and the duty cycle is driven externally by calling the AgentInvoker.invoke() method on the Archive.invoker() object. Each call to invoke with perform one duty cycle of the Archive.

The overall threading usage of the Archive is a combination for the ArchviveThreadingMode and the ThreadingMode for the composed MediaDriver.

Archive Client

The Archive Client can communicate with an Archive using the control protocol and receive events on the progress of recording via the recording events stream. To support dynamic subscribers for the recording events stream then its channel can be multicast or MDC (Multi-Destination-Cast).

Samples using an archive can be found here.

Recording Streams

An archive can be instructed to record streams, i.e. <channel, streamId> pairs. These streams are recorded with the file sync level the archive has been launched with. Progress is reported on the recording events stream.

  • aeron.archive.file.sync.level=0: for normal writes to the OS page cache for background writing to disk.
  • aeron.archive.file.sync.level=1: for forcing the dirty data pages to disk.
  • aeron.archive.file.sync.level=2: for forcing the dirty data pages and file metadata to disk.

When setting file sync level greater than zero it is also important to sync the archive catalog with the aeron.archive.catalog.file.sync.level to the same value.

Recordings will be assigned a recordingId and a full description of the stream is captured in the Archive Catalog. The Catalog chronicles the contents of an archive as RecordingDescriptors which can be queried.

The progress of active recordings can be tracked using AeronStat to view the rec-pos counter for each stream.

Recording When No Subscribers Are Connected

It is possible to record a stream, over a UDP based channel, when no subscribers have connected by enabling a property so that spy subscriptions can simulate a connection on the media driver via either of:

  • aeron.spies.simulate.connection system property to true.
  • MediaDriver.Context.spiesSimulateConnection(boolean) to true.

Spy subscriptions are very efficient when used to record a local outbound network publication without requiring a local subscription to receive the outbound data stream on a inbound connection. IPC based channels will be connected when the archive starts to record regardless of if this property is set or not.

Querying the Archive Catalog

The contents for Archive can be queried by listing the RecordingDescriptors. This can be a simple paging through all descriptors in the catalog, or the query can be filtered by <channel, streamId> to reduce the result set.

The SBE message format for the descriptors is as follows:

    <sbe:message name="RecordingDescriptor"
                 id="22"
                 description="Describes a recording in the catalog">
        <field name="controlSessionId"   id="1"  type="int64"/>
        <field name="correlationId"      id="2"  type="int64"/>
        <field name="recordingId"        id="3"  type="int64"/>
        <field name="startTimestamp"     id="4"  type="time_t"/>
        <field name="stopTimestamp"      id="5"  type="time_t"/>
        <field name="startPosition"      id="6"  type="int64"/>
        <field name="stopPosition"       id="7"  type="int64"/>
        <field name="initialTermId"      id="8"  type="int32"/>
        <field name="segmentFileLength"  id="9"  type="int32"/>
        <field name="termBufferLength"   id="10" type="int32"/>
        <field name="mtuLength"          id="11" type="int32"/>
        <field name="sessionId"          id="12" type="int32"/>
        <field name="streamId"           id="13" type="int32"/>
        <data  name="strippedChannel"    id="14" type="varAsciiEncoding"/>
        <data  name="originalChannel"    id="15" type="varAsciiEncoding"/>
        <data  name="sourceIdentity"     id="16" type="varAsciiEncoding"/>
    </sbe:message>

Note: A stopPosition of -1 means a live recording is in progress.

Replaying Streams

A replay of a recorded stream can be requested by recordingId. It is possible to replay a recording that has stopped or one that is currently in progress. Replays are request from a position and for a length. The replay session id returned from the replay request will be the sessionId of the replayed Aeron stream in the lower 32-bits which can be obtained with a downcast to an int. The full 64-bit return value for the replay session is required to stop the replay early.

If the requested replay is for a recording that is currently in progress and the length goes beyond the current recorded position then replay will track the live recording. This can be useful for a subscriber that wants to track a live stream but does not want to participate in flow control of the transient stream if it make slow the others down.

Replays of live streams should sent to a different <channel, streamId> pairing to avoid congestion.