- Delta Importer is a tool designed to be run on the Storage Provider infrastructure.
- It facilitates automation of import deals - that is, importing .car files from the filesystem that match the CID of deal proposals sent to the provider.
- It integrates with Delta-DM (Dataset Manager) to request deals from the self-service API, facilitating a fully automated dealmaking & deal ingestion pipeline.
- It has multiple modes of operation, covering a variety of different data ingestion strategies
- It’s designed from the ground up to be high performance, written in Go. It has tuneable import frequency/concurrent maximum to optimize for sealing throughput
- Only one instance of Delta Importer is required per instance of Boost
We intend to make the deal ingestion process fully automated, intelligent and streamlined, such that there is no functional difference between End-to-end (Online) and Import (Offline) deals. This will allow large-scale providers to easily and efficiently onboard large datasets, where the data transfer is decoupled from the dealmaking process.
- Go v1.19+
- Rust (needed to build filecoin-ffi)
- Boost v1.6.0+
Assumption: all carfiles to import are named
<pieceCID>.car
, which matches the PieceCID of the deal made with Boost. This obviates the need for a File<>Deal mapping, as the importer can simply scan the filesystem for a file matching the PieceCID of the deal.
Perform the following steps from a user account with root
privileges. Note: Once installed, the delta-importer
binary can be run from any user account.
Build from Source
- Clone
git clone https://github.com/application-research/delta-importer.git
make all
make install
This will install the delta-importer
binary to /usr/local/bin
. Test it out by running delta-importer --help
.
delta-importer
NAME:
delta-importer - An application to facilitate importing deals into a Filecoin Storage Provider
USAGE:
delta-importer [global options] command [command options] [arguments...]
COMMANDS:
daemon, d run the delta-importer daemon to continuously import deals
stats get stats about imported deals
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--help, -h show help
--version, -v print the version
By default, delta-importer
stores all its local data in the ~/delta/importer
directory for the currently running user. If it does not exist, the tool will attempt to create the directory structure on first launch of the daemon
command. This can be changed using the --dir
flag or DELTA_DIR
environment variable.
Delta Importer daemon requires a few configuration options to be set. These can be set via environment variables, or via command line flags.
Below is an example shell script to launch the importer daemon, running in default mode, and importing a new deal every 260 seconds, until a maximum of 175 deals are active in the sealing pipeline (AP+PC1+PC2+C2).
delta-importer daemon \
--boost-url 10.10.10.20 \
--boost-gql-port 8080 \
--boost-port 1288 \
--boost-auth-token XXX.YYY.ZZZ \
--max_concurrent 175 \
--interval 260 \
--mode default
- Obtain the
boost-auth-token
by running theboostd auth create-token --perm admin
command on your Boost node. - Obtain the
boost-url
andboost-port
by runningboostd auth api-info --perm admin
on your Boost node. - The
--interval
and--max_concurrent
flags are used to tweak the importer's speed. These parameters should be carefully tuned to match the provider's sealing throughput and available bandwidth. The example provided above is a good starting point for a provider with approximately 10TiB/day of sealing throughput. - See Operational Modes below for explanation of the
--mode
flag - Set the
--staging-dir
flag to have Delta Importer automatically copy carfiles to a staging directory before importing them. This is useful if your carfiles reside on a slower or remote filesystem, as Boost needs to read them twice (once for CommP verification, and once for AddPiece). If this is set, the carfiles will be automatically deleted from the staging directory after import is complete.
The datasets.json
file is required to be present in the delta-importer
data directory (defaults to ~/delta/importer/
). This file maintains a mapping between client wallets
(i.e, who is making deals) with a dataset slug
(identifier), and a directory to search for CAR files to import.
Example datasets.json
[
{
"dataset": "radiant-ml",
"address": ["f1p3l3wgnfukemmaupqecwcoqp7fcgjcqgqcq7rja"],
"dir": "/mnt/delta-datasets/radiant-poc",
"ignore": false
},
{
"dataset": "cancer-imaging-archive",
"address": ["f1p3l3wgnfukemmaupqecwcoqp7fcgjcqgqcq7rja", "f2vyp7qmi4pvuj3f3qiha6oyskrjdho2xw6cjiexi"],
"dir": "/mnt/delta-datasets/cancer-imaging-archive",
"ignore": true
}
]
This datasets.json
file will be processed in order, preferring deals with the first dataset in the list.
Using the above example,
- If a deal is found for
radiant-ml
, the importer will scan the/mnt/delta-datasets/radiant-poc
directory for a CAR file matching the PieceCID of the deal. - If a match is found, the importer will import the data.
- If no match is found, the importer will move on to the next dataset in the list, and attempt to import data for that dataset.
Set the ignore
flag to true
to skip a dataset. This is useful if you want to speed-up the import loop by disabling a dataset from being imported (ex. if datacap has been exhausted, or data transfer is not complete yet)
Note: The
dataset
field must be unique across all entries in thedatasets.json
file
Delta-Importer can be ran in three modes:
- Default (Boost Scanning) Mode: This is the default mode.
--mode default // not required
In this mode, Delta Importer will scan Boost for deals awaiting import, and automatically match them to CAR files on the filesystem and import them.
- Pull Mode - Dataset
--mode pull-dataset
In this mode, the Delta Importer will request deals from the DDM self-service API per-dataset, before attempting to import them.
- Pull Mode - CID
--mode pull-cid
In this mode, the Delta Importer will scan the filesystem for CAR files, and make requests to the DDM self-service API for each carfile. It will check Boost to ensure duplicate deals are not requested.
When using in either Pull Mode
, the --ddm-api
and --ddm-token
flags are required. These indicate the DDM API endpoint and the API token to use when making deal requests to the DDM API. Contact your DDM administrator for these parameters.
Additionally, Pull Mode
allows optional specification of
--ddm-delay-start
, which delays the number of days for requested deals start epoch. Valid values are between1
and14
, for example--ddm-delay-start 7
--ddm-advance-end
, which advances the end epoch (i.e, shortens deal duration) by the specified number of days. Valid values are between0
and20
, for example--ddm-advance-end 10
example pull mode (Dataset) configuration
delta-importer daemon\
--boost-url 10.32.32.20 \
--boost-gql-port 8080
--boost-port 1288 \
--boost-auth-token XXX.YYY.ZZZ \
--max_concurrent 160 \
--interval 220 \
--mode pull-dataset \
--ddm-delay-start 7 \
--ddm-advance-end 10 \
--ddm-api http://ddm-api.delta.store/api/v1/self-service \
--ddm-token 4b28d311-8be6-48d7-801f-dcb6a87ad49d
Run delta-importer stats
to get a table showing statistics on imported deal data.