This repository has tools and notes for demonstration and evaluation of Rucio for LIGO bulk data management.
Some notes on getting started
RUCIO_HOME
must point to a directory which includesetc/rucio.cfg
rucio.cfg
should look like:
[client]
rucio_host = https://rucio-ligo.grid.uchicago.edu:443
auth_host = https://rucio-ligo.grid.uchicago.edu:443
ca_cert = /etc/grid-security/certificates
client_x509_proxy = /tmp/x509up_p2411400.filearAiBG.1
request_retries = 3
auth_type = x509
client_cert = /tmp/x509up_p2411400.filearAiBG.1
client_key = /tmp/x509up_p2411400.filearAiBG.1
where client_cert
and client_key
should point to the output of
grid-proxy-info -path
- Admin tasks should have
RUCIO_ACCOUNT=root
- User tasks should have
RUCIO_ACCOUNT=jclark
(for example)
The first thing we need is an RSE (container for files) to upload our files to.
- Create the RSE (see e.g., CLI admin
examples:
rucio-admin rse add LIGOTEST
- Add supported protocols (e.g., srm, gsiftp, http, ...). To begin with, we can just use gsiftp:
rucio-admin rse add-protocol \ --prefix /user/ligo/rucio \ --domain-json '{"wan": {"read": 1, "write": 1, "delete": 1, "third_party_copy": 1}}' \ --scheme gsiftp \ --hostname red-gridftp.unl.edu \ LIGOTEST
Note that rucio-admin operations should be performed with RUCIO_ACCOUNT=root
At least for testing, we will designate scopes according to data-taking runs (engineering and observing runs). To create an ER8 scope:
rucio-admin scope add --account jclark --scope ER8
See e.g., rucio scope docs
Now that we have an RSE and a scope we can experiment with the CLI examples
- Uploading a single frame with scope "ER8"
rucio -v upload \
/hdfs/frames/ER8/hoft_C02/H1/H-H1_HOFT_C02-11262/H-H1_HOFT_C02-1126256640-4096.gwf
--rse LIGOTEST --scope ER8 \
--name H-H1_HOFT_C02-1126256640-4096.gwf
Should generate something like,
2018-02-05 13:33:31,104 DEBUG Extracting filesize (457680774) and checksum
(ef00cf51) for file ER8:H-H1_HOFT_C02-1126256640-4096
2018-02-05 13:33:31,106 DEBUG Automatically setting new GUID
2018-02-05 13:33:31,381 DEBUG Using account root
2018-02-05 13:33:31,381 DEBUG Skipping dataset registration
2018-02-05 13:33:31,381 DEBUG Processing file
ER8:H-H1_HOFT_C02-1126256640-4096 for upload
2018-02-05 13:33:39,285 INFO Local files and file
ER8:H-H1_HOFT_C02-1126256640-4096 recorded in Rucio have the same checksum. Will
try the upload
2018-02-05 13:33:56,808 INFO File ER8:H-H1_HOFT_C02-1126256640-4096.gwf
successfully uploaded on the storage
2018-02-05 13:33:56,809 DEBUG sending trace
2018-02-05 13:33:57,270 DEBUG Finished uploading files to RSE.
2018-02-05 13:33:57,505 INFO Will update the file replicas states
2018-02-05 13:33:57,586 INFO File replicas states successfully updated
Completed in 34.7796 sec.
A next step is to set up a python simple script to:
- Retrieve a list of frame files which corresponds to some nominal data set
- Loop through the list and call the Ruico API
This can be easily achieved with a simple python script which makes use of the pycbc datafind module and a pip install of Rucio.
cmsexample.py is a command line tool for registering a CMS dataset into rucio. This set of slides describes the CMS evaluation. The CMS hierachy is more complicated than (at least our initial test) in LIGO. In CMS:
- Files: ~4GB
- Blocks (Rucio dataset): chunks of ~100 files. This is the typical unit of data transfer.
- Datasets (Rucio container): N blocks with some physical meaning
The (current) proposed LIGO arrangement is simpler:
- LIGO runs (ER8, O1, ...): Rucio scope
- LIGO dataset == Rucio dataset
Here's a run-through of cmsexample.py:
- Instantiate the
DataSetInjector
object, a general class for injecting a cms dataset into rucio DataSetInjector
has methods to create containers and register files and datasets- This class has methods for finding the rucio url and filenames
I do not need anything to do with rucio containers (yet) so can just mimic the parts associated with file and data set registration, and some of the sanity checking. I should be able to swap out my existing routines for translating LIGO file URLs to Rucio DIDs.