Replication Algorithm

by Jens Alfke

This is a historical document that applies to version 1 of Couchbase Lite. Version 2 uses a different protocol (based on WebSockets) that's a lot more efficient, but the high level algorithm is still the same.

Introduction

Couchbase Lite 1.x's replication protocol is compatible with Apache CouchDB. This interoperability is an important feature, but implementing it was challenging because much of CouchDB's replication protocol is undocumented. In the future I would like to see an explicit spec for replication, to ensure that different products remain compatible. For now I'll document it here, as I understand it.

Note: If you want to follow along, this algorithm is implemented in Couchbase Lite's Replicator class and its subclasses Pusher and Puller (plus a number of helper classes.)

These notes were derived from reading the API documentation on the CouchDB wiki and from conversation with engineers who've worked on CouchDB's replicator (Damien Katz and Filipe Manana). But don't take them as gospel.

Some extensions have been added over time, to improve performance; these are supported by Couchbase Sync Gateway but not by CouchDB.

Protocol? What Protocol?

There really isn't a separate "protocol" per se for replication. Instead, replication uses CouchDB's REST API and data model. It's therefore a bit difficult to talk about replication independently of the rest of CouchDB. In this document I'll focus on the algorithm used, and link to documentation of the APIs it invokes. The "protocol" is simply the set of those APIs operating over HTTP.

Algorithm

Goal

Given a source and a target database, identify all current revisions (including deletions) in the source that do not exist in the target, and copy them (with contents, attachments and histories) to the target. Afterwards, all current revisions in the source exist at the target and have the same revision histories there.

Secondary goal: Do this without redundantly transferring the contents of any revisions that already exist at the target.

Note: A current revision is one that has not been replaced, i.e. a leaf node in the revision tree. Most of the time a document has only one current revision, but multiple current revisions can exist and that's called a conflict.

Steps

Get unique identifiers for the source and target databases (which may just be their URLs, if no UUIDs are available).
Generate a unique identifier for this replication based on the database IDs, and the filter name and parameters (if any). For instance, you can concatenate these with an unambiguous delimiter and then run that string through a cryptographic digest algorithm like SHA-1. The exact mechanism doesn't matter, because this identifier is used only by a particular implementation.
Use this identifier to generate the doc ID of a special (_local, non-replicated) document, and get this document from both the source and the target database. The document contains the last source sequence ID (also called a "checkpoint") that was read and processed by the previous replication. If the document is missing in either database, or if its contents are inconsistent, that's OK: the replication will just start from scratch without a checkpoint.
Fetch the source database's _changes feed, starting just past the last source sequence ID (if any). Use the "?style=all_docs" URL parameter so that conflicting revisions will be included. In continuous replication you should use the "?feed=longpoll", "?feed=continuous", or "?feed=websocket" [SG only] mode and leave the algorithm running indefinitely to process changes as they occur. Filtered replication will specify the name of a filter function in this URL request.
Collect a group of document/revision ID pairs from the _changes feed and send them to the target database's _revs_diff. The result will contain the subset of those revisions that are not in the target database.
GET each such revision from the source database. Use the ?revs=true URL parameter to include its list of parent revisions, so the source database can update its revision tree. Use "?attachments=true" so the revision data will include attachment bodies. Also use the "?atts_since" query parameter to pass a list of revisions that the target already has, so the source can optimize by not including the bodies of attachments already known to the target. (Couchbase Lite and Sync Gateway support a nonstandard _bulk_get call that can retrieve large numbers of revisions in one request.)
Collect a group of revisions fetched by the previous step, and store them into the target database using the _bulk_docs API, with the new_edits:false JSON property to preserve their revision IDs.
After a group of revisions is stored, save a checkpoint: update the last source sequence ID value in the target database. It should be the latest sequence ID for which its revision and all prior to it have been added to the target. (Even if some revisions are rejected by a target validation handler, they still count as 'added' for this purpose.)

There's also a ladder diagram which shows these steps along with the interaction between the replicator and source/target db's.

Notes

The replication algorithm does not have to run on either the source's or target's server. It could be run from anywhere with read access to the source and write access to the target. However, it's nearly always run by either the source or target server (and Couchbase Lite only supports those modes). Replication run by the source is commonly called a "push", and by the target is called a "pull".
An implementation running directly in source or target server will optimize by using lower-level APIs to operate on the local database; for example, it listens for internal change notifications rather than reading the _changes feed, makes a direct database query instead of calling _revs_diff, and directly inserts into the database instead of calling _bulk_docs.
Replication does not transfer obsolete revisions of documents, only the current ones. This derives from the behavior of the _changes feed, which only lists current revisions. Replication does transfer the revision history of each document, which is just the list of IDs of prior revisions; this is to make it possible for the database to identify common ancestors and merge revision histories into a tree.
Sequence IDs are usually but not necessarily numeric. (Currently the only exception I know of is BigCouch.) Non-numeric sequence IDs are not intrinsically ordered, i.e. they are opaque strings that can only be compared for equality. To compare their ordering (when checkpointing) you have to keep an ordered list of sequence IDs as they appeared in the _changes feed and compare their indices in that.

Performance

For efficiency, the algorithm should run in parallel, as a data-flow system, with multiple steps active at the same time. This reduces the overhead of network and database latency.
Also for efficiency, the number of revisions passed in a single _revs_diff or _bulk_docs call should be large. This means the implementation should group together revisions arriving from previous steps until a sufficient number have arrived or sufficient time has elapsed.
From my limited testing, the performance bottleneck in the current algorithm seems to be in fetching the new revisions from the source. I think this is due to the overhead of handling many separate HTTP requests. This was the rationale for adding _bulk_get in Couchbase Mobile.
A limited case of the above-mentioned bulk-get optimization is possible with the standard API: revisions of generation 1 (revision ID starts with "1-") can be fetched in bulk via _all_docs, because by definition they have no revision histories. Unfortunately _all_docs can't include attachment bodies, so if it returns a document whose JSON indicates it has attachments, those will have to be fetched separately. Nonetheless, this optimization can help significantly, and is currently implemented in Couchbase Lite.

API Calls Used

These are the CouchDB REST API calls that Couchbase Lite makes to the remote database.

GET /db /_local/checkpointid — To read the last checkpoint
PUT /db /_local/checkpointid — To save a new checkpoint

Push Only:

PUT /db — If told to create remote database (not applicable to Sync Gateway)
POST /db /_revs_diff — To find which revs are not known to the remote db
POST /db /_bulk_docs — To upload revisions
POST /db /docid ?new_edits=false — To upload a single doc with attachments

Pull Only:

POST /db /_changes?style=all_docs&feed=feed &since=since &limit=limit &heartbeat=heartbeat — To find changes since the last pull (feed will be normal, longpoll, or websocket)
GET /db /docid ?rev=revid &revs=true&attachments=true&atts_since=lastrev — To download a single doc with attachments
POST /db /_all_docs?include_docs=true — To download first-generation revisions in bulk
POST /db /_bulk_get?revs=true&attachments=true — To download documents in bulk (nonstandard; implemented by Sync Gateway)

Replication Algorithm

Introduction

Protocol? What Protocol?

Algorithm

Goal

Steps

Notes

Performance

API Calls Used

Push Only:

Pull Only:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally