-
Notifications
You must be signed in to change notification settings - Fork 297
Replication Algorithm
by Jens Alfke
This is a historical document that applies to version 1 of Couchbase Lite. Version 2 uses a different protocol (based on WebSockets) that's a lot more efficient, but the high level algorithm is still the same.
Couchbase Lite 1.x's replication protocol is compatible with Apache CouchDB. This interoperability is an important feature, but implementing it was challenging because much of CouchDB's replication protocol is undocumented. In the future I would like to see an explicit spec for replication, to ensure that different products remain compatible. For now I'll document it here, as I understand it.
Note: If you want to follow along, this algorithm is implemented in Couchbase Lite's
Replicator
class and its subclassesPusher
andPuller
(plus a number of helper classes.)
These notes were derived from reading the API documentation on the CouchDB wiki and from conversation with engineers who've worked on CouchDB's replicator (Damien Katz and Filipe Manana). But don't take them as gospel.
Some extensions have been added over time, to improve performance; these are supported by Couchbase Sync Gateway but not by CouchDB.
There really isn't a separate "protocol" per se for replication. Instead, replication uses CouchDB's REST API and data model. It's therefore a bit difficult to talk about replication independently of the rest of CouchDB. In this document I'll focus on the algorithm used, and link to documentation of the APIs it invokes. The "protocol" is simply the set of those APIs operating over HTTP.
Given a source and a target database, identify all current revisions (including deletions) in the source that do not exist in the target, and copy them (with contents, attachments and histories) to the target. Afterwards, all current revisions in the source exist at the target and have the same revision histories there.
Secondary goal: Do this without redundantly transferring the contents of any revisions that already exist at the target.
Note: A current revision is one that has not been replaced, i.e. a leaf node in the revision tree. Most of the time a document has only one current revision, but multiple current revisions can exist and that's called a conflict.
- Get unique identifiers for the source and target databases (which may just be their URLs, if no UUIDs are available).
- Generate a unique identifier for this replication based on the database IDs, and the filter name and parameters (if any). For instance, you can concatenate these with an unambiguous delimiter and then run that string through a cryptographic digest algorithm like SHA-1. The exact mechanism doesn't matter, because this identifier is used only by a particular implementation.
- Use this identifier to generate the doc ID of a special (
_local
, non-replicated) document, and get this document from both the source and the target database. The document contains the last source sequence ID (also called a "checkpoint") that was read and processed by the previous replication. If the document is missing in either database, or if its contents are inconsistent, that's OK: the replication will just start from scratch without a checkpoint. - Fetch the source database's
_changes
feed, starting just past the last source sequence ID (if any). Use the "?style=all_docs
" URL parameter so that conflicting revisions will be included. In continuous replication you should use the "?feed=longpoll
", "?feed=continuous
", or "?feed=websocket
" [SG only] mode and leave the algorithm running indefinitely to process changes as they occur. Filtered replication will specify the name of a filter function in this URL request. - Collect a group of document/revision ID pairs from the
_changes
feed and send them to the target database's_revs_diff
. The result will contain the subset of those revisions that are not in the target database. -
GET
each such revision from the source database. Use the?revs=true
URL parameter to include its list of parent revisions, so the source database can update its revision tree. Use "?attachments=true
" so the revision data will include attachment bodies. Also use the "?atts_since
" query parameter to pass a list of revisions that the target already has, so the source can optimize by not including the bodies of attachments already known to the target. (Couchbase Lite and Sync Gateway support a nonstandard_bulk_get
call that can retrieve large numbers of revisions in one request.) - Collect a group of revisions fetched by the previous step, and store them into the target database using the
_bulk_docs
API, with thenew_edits:false
JSON property to preserve their revision IDs. - After a group of revisions is stored, save a checkpoint: update the last source sequence ID value in the target database. It should be the latest sequence ID for which its revision and all prior to it have been added to the target. (Even if some revisions are rejected by a target validation handler, they still count as 'added' for this purpose.)
There's also a ladder diagram which shows these steps along with the interaction between the replicator and source/target db's.
-
The replication algorithm does not have to run on either the source's or target's server. It could be run from anywhere with read access to the source and write access to the target. However, it's nearly always run by either the source or target server (and Couchbase Lite only supports those modes). Replication run by the source is commonly called a "push", and by the target is called a "pull".
-
An implementation running directly in source or target server will optimize by using lower-level APIs to operate on the local database; for example, it listens for internal change notifications rather than reading the
_changes
feed, makes a direct database query instead of calling_revs_diff
, and directly inserts into the database instead of calling_bulk_docs
. -
Replication does not transfer obsolete revisions of documents, only the current ones. This derives from the behavior of the
_changes
feed, which only lists current revisions. Replication does transfer the revision history of each document, which is just the list of IDs of prior revisions; this is to make it possible for the database to identify common ancestors and merge revision histories into a tree. -
Sequence IDs are usually but not necessarily numeric. (Currently the only exception I know of is BigCouch.) Non-numeric sequence IDs are not intrinsically ordered, i.e. they are opaque strings that can only be compared for equality. To compare their ordering (when checkpointing) you have to keep an ordered list of sequence IDs as they appeared in the
_changes
feed and compare their indices in that.
-
For efficiency, the algorithm should run in parallel, as a data-flow system, with multiple steps active at the same time. This reduces the overhead of network and database latency.
-
Also for efficiency, the number of revisions passed in a single
_revs_diff
or_bulk_docs
call should be large. This means the implementation should group together revisions arriving from previous steps until a sufficient number have arrived or sufficient time has elapsed. -
From my limited testing, the performance bottleneck in the current algorithm seems to be in fetching the new revisions from the source. I think this is due to the overhead of handling many separate HTTP requests. This was the rationale for adding
_bulk_get
in Couchbase Mobile. -
A limited case of the above-mentioned bulk-get optimization is possible with the standard API: revisions of generation 1 (revision ID starts with "
1-
") can be fetched in bulk via_all_docs
, because by definition they have no revision histories. Unfortunately_all_docs
can't include attachment bodies, so if it returns a document whose JSON indicates it has attachments, those will have to be fetched separately. Nonetheless, this optimization can help significantly, and is currently implemented in Couchbase Lite.
These are the CouchDB REST API calls that Couchbase Lite makes to the remote database.
-
GET /
db/_local/
checkpointid — To read the last checkpoint -
PUT /
db/_local/
checkpointid — To save a new checkpoint
-
PUT /
db — If told to create remote database (not applicable to Sync Gateway) -
POST /
db/_revs_diff
— To find which revs are not known to the remote db -
POST /
db/_bulk_docs
— To upload revisions -
POST /
db/
docid?new_edits=false
— To upload a single doc with attachments
-
POST /
db/_changes?style=all_docs&feed=
feed&since=
since&limit=
limit&heartbeat=
heartbeat — To find changes since the last pull (feed will benormal
,longpoll
, orwebsocket
) -
GET /
db/
docid?rev=
revid&revs=true&attachments=true&atts_since=
lastrev — To download a single doc with attachments -
POST /
db/_all_docs?include_docs=true
— To download first-generation revisions in bulk -
POST /
db/_bulk_get?revs=true&attachments=true
— To download documents in bulk (nonstandard; implemented by Sync Gateway)