rbh-sync allows synchronizing one backend with another.
Install the RobinHood library first, then download the sources:
git clone https://github.com/cea-hpc/rbh-sync.git
cd rbh-sync
Build and install with meson and ninja:
meson builddir
ninja -C builddir
sudo ninja -C builddir install
rbh-sync can be used to create or synchronize one RobinHood Backend [1] with another.
[1] | simply put, a storage backend which RobinHood can use to store a filesystem's metadata, and later query it. |
RobinHood uses URIs to identify backends. It uses its own scheme. The syntax for the RobinHood scheme is detailed in the library's documentation. Here are the key takeaways:
- RobinHood URIs always start with
rbh:
; - followed by a type of backend (currently either
mongo
orposix
); [2] - another colon (":");
- and a filesystem identifier, which we call an
fsname
.
the scheme the filesystem's identifier vvv vvvvvvvvv rbh:mongo:my-fsname ^^^^^ the backend type
This references a (whole) backend.
The syntax for fsname
depends on the backend's type:
- for
mongo
it can pretty much be anything you want; [3] - for
posix
orlustre
it must be the path to the backend's root.
rbh:posix:/mnt/path/to/dir rbh:lustre:/mnt/lustre/path-to-dir rbh:mongo:something-that-makes-me-think-of-/mnt/path/to/dir rbh:posix:/scratch rbh:lustre:/work rbh:mongo:scratch
Optionnally, you can append a #
, followed by either:
- a path (relative to the backend's root);
- or an fsentry's ID enclosed in brackets ("[", "]"), or a FID if the backend manages entries of a Lustre filesystem.
This references a particular fsentry in a backend.
rbh:posix:/scratch#testuser/somedir rbh:lustre:/work#testuser2/somedirbis rbh:mongo:scratch#[0x200000007:0x1:0x0]
We choose not to put an example with a regular fsentry's ID here, as they are impractical to write on a command line.
The interested user should know that to use this syntax, they will need to percent-escape any reserved or non-printable character. Refer to RFC3986's sections 2.1 and 2.2 for more information on this.
[2] | this is used by applications to infer which dynamic lybrary should be used to interact with the backend. |
[3] | it will be used as the name of the actual Mongo database. |
What is all the fuss with RobinHood URIs then? Well they are integral to rbh-sync's command line interface. [4]
Much like rsync, rbh-sync takes two arguments, both of which are URIs:
rbh-sync rbh:posix:/mnt/scratch rbh:mongo:scratch
This synchronizes rbh:mongo:scratch
with rbh:posix:/mnt/scratch
, meaning
that when the process terminates, rbh:mongo:scratch
should contain a copy
of all the metadata in rbh:posix:/mnt/scratch
when the process started.
[4] | and likely any other RobinHood application. |
An important thing to remember is that rbh-sync does not freeze the source backend nor the destination backend. Thus, if they are modified at the same time rbh-sync uses them, rbh-sync cannot garantee that it will do the right thing.
For example, if the source backend is updated while rbh-sync uses it, rbh-sync might:
- miss the update;
- see an incomplete version of the update;
- simply see the whole update.
In both the first and second cases, the destination backend will contain stale metadata at the end of the run.
Conversely, if the destination backend is updated while rbh-sync operates on it, there is no particular garantee that the resulting metadata will be consistent.
To work around this, if either the source backend or the destination backend was updated while rbh-sync ran, just run rbh-sync again.
The destination backend might never be exactly up-to-date, but you can be sure that it will always go forward. In this sense, you get a level of consistency comparable to that of a local filesystem: eventual consistency.
rbh-sync is fundamentally a single-threaded program. There is no plan to parallelize it any time in the future.
Nevertheless, rbh-sync being a single-threaded program does not mean you cannot run several instances of it, in parallel. The following script should therefore provide a reasonable amount of parallelization, without sacrificing consistency.
for entry in /path/to/dir/*; do
rbh-sync rbh:posix:/scratch#"$entry" rbh:mongo:scratch &
done
rbh-sync --one rbh:posix:/scratch rbh:mongo:scratch &
wait
Also, since rbh-sync heavily relies on the backends' implementation, if these were to implement any sort of parallelization, rbh-sync would transparently benefit from it.