cd $HOME
git clone https://<username>@github.com/CINERGI/Foundry
cd $HOME/Foundry
Before you start the build process, you need to install three libraries from dependencies
directory to your local maven repository
cd $HOME/Foundry/dependencies
./install_prov_xml_2mvn.sh
./install_prov_model_2mvn.sh
./install_prov_json__2mvn.sh
./install_bnlp_2mvn.sh
./install_bnlp_dependencies_2mvn.sh
./install_bnlp_model2mvn.sh
Afterwards
mvn -Pdev clean install
Here dev profile is used. There are production prod
and dev
profiles for differrent configurations for development and production environments.
The configuration files are located under each sub-project. For example,
the configuration files for the dispatcher component are located under
$HOME/Foundry/dispatcher/src/main/resources
.
$HOME/Foundry/dispatcher/src/main/resources
├── dev
│ └── dispatcher-cfg.xml
└── prod
└── dispatcher-cfg.xml
When you use -Pdev
argument, configuration file from the dev
directory is included in the jar file.
All subsystem configuration files are generated from a master configuration file in YAML format.
An example master configuration file can be found at $HOME/Foundry/bin/config-spec.yml.example
.
Once you create a master config file named say config.yml
run the following to generate all configuration files for the subsystems (for dev profile)
cd $HOME/Foundry/bin
./config_gen.sh -c config.yml -f $HOME/Foundry -p dev
./config_gen.sh -h
usage: ConfigGenerator
-c <cfg-spec-file> Full path to the Foundry config spec YAML file
-f <foundry-root-dir>
-h print this message
-p <profile> Maven profile ([dev]|prod)
After each configuration file generation you need to run maven to move the configs to their target locations
mvn -Pdev install
The system uses MongoDB as its backend. Both 2.x and 3.x versions of MongoDB are tested with the system. If you are using MongoDB 3.x, preferred storage engine is wiredTiger.
-
Download and unpack Apache ActiveMQ 5.10.0 Release to a directory of your choosing (
$MQ_HOME
). -
To start message queue server at default port
61616
, go to$MQ_HOME/bin
directory and run
activemq start
- To stop the activemq server
activemq stop
The system consists of a dispatcher, a consumer head and a CLI manager interface. The dispatcher listens to the MongoDB changes and using its configured workflow dispatches messages to the message queue for the listening consumer head(s). The consumer head coordinates a set of configured consumers that do a prefined operation of a document indicated by the message they receive from the dispatcher and ingestors. The ingestors are specialized consumers that are responsible for the retrieval of the original data as configured by harvest descriptor JSON file of the corresponding source. They are triggered by the manager application.
Before any processing the MongoDB needs to be populated with the source descriptors using
the $HOME/Foundry/bin/ingest_src_cli.sh
.
./ingest_src_cli.sh -h
usage: SourceIngestorCLI
-c <config-file> config-file e.g. ingestor-cfg.xml (default)
-d delete the source given by the source-json-file
-h print this message
-j <source-json-file> harvest source description file
-u update the source given by the source-json-file
Example source descriptors are under $HOME/Foundry/consumers/etc
.
An example usage for inserting IEDA Earthchem Library source descriptor document to the sources
collection is show below
./ingest_src_cli.sh -j $HOME/Foundry/consumers/etc/cinergi-0023.json
Source descriptors are JSON files. To create a new source descriptor for a new resource, copy one of the example source descriptors to a new file and edit it. For more information about a source/harvest descriptor see Harvest Description.
The script for the dispatcher component dispatcher.sh
is located in
$HOME/Foundry_ES/bin
. By default it uses dispatcher-cfg.xml
file for the
profile specified during the build. This needs to run in its own process.
To stop it, use Ctrl-C
.
./dispatcher.sh -h
usage: Dispatcher
-c <config-file> config-file e.g. dispatcher-cfg.xml (default)
-h print this message
The script for the consumer head component consumer_head.sh
is located in
$HOME/Foundry_ES/bin
. By default it uses consumers-cfg.xml
file for the
profile specified during the build. For production use you need to specify
-f
option. This needs to run in its own process. To stop it, use Ctrl-C
.
./consumer_head.sh -h
usage: ConsumerCoordinator
-c <config-file> config-file e.g. consumers-cfg.xml (default)
-cm run in consumer mode (no ingestors)
-f full data set default is 100 documents
-h print this message
-n <max number of docs> Max number of documents to ingest
-p send provenance data to prov server
-t run ingestors in test mode
Manager is an interactive command line application for sending ingestion start messages for resources to the consumer head(s).
It also have some convenience functions to cleanup MongoDB data for a given resource and delete ElasticSearch indices.
By default manager app uses dispatcher-cfg.xml
file for the profile specified
during the build.
./manager.sh -h
usage: ManagementService
-c <config-file> config-file e.g. dispatcher-cfg.xml (default)
-h print this message