BioMAJ3

This project is a complete rewrite of BioMAJ (http://biomaj.genouest.org).

BioMAJ (BIOlogie Mise A Jour) is a workflow engine dedicated to data synchronization and processing. The Software automates the update cycle and the supervision of the locally mirrored databank repository.

Common usages are to download remote databanks (Genbank for example) and apply some transformations (blast indexing, emboss indexing, etc.). Any script can be applied on downloaded data. When all treatments are successfully applied, bank is put in "production" on a dedicated release directory. With cron tasks, update tasks can be executed at regular interval, data are downloaded again only if a change is detected.

More documentation is available in wiki page.

BioMAJ is python 2 and 3 compatible.

Getting started

Edit global.properties file to match your settings. Minimal conf are database connection and directories.

biomaj-cli.py -h

biomaj-cli.py --config global.properties --status

biomaj-cli.py --config global.properties  --bank alu --update

Migration

To migrate from previous BioMAJ 1.x, a script is available at: https://github.com/genouest/biomaj-migrate. Script will import old database to the new database, and update configuration files to the modified format. Data directory is the same.

Migration for 3.0 to 3.1:

Biomaj 3.1 provides an optional micro service architecture, allowing to separate and distributute/scale biomaj components on one or many hosts. This implementation is optional but recommended for server installations. Monolithic installation can be kept for local computer installation. To upgrade an existing 3.0 installation, as biomaj code has been split into multiple components, it is necessary to install/update biomaj python package but also biomaj-cli and biomaj-daemon packages. Then database must be upgraded manually (see Upgrading in documentation).

To execute database migration:

python biomaj_migrate_database.py

Application Features

Synchronisation:
Multiple remote protocols (ftp, sftp, http, local copy, etc.)
Data transfers integrity check
Release versioning using a incremental approach
Multi threading
Data extraction (gzip, tar, bzip)
Data tree directory normalisation
Pre &Post processing :
Advanced workflow description (D.A.G)
Post-process indexation for various bioinformatics software (blast, srs, fastacmd, readseq, etc.)
Easy integration of personal scripts for bank post-processing automation
Supervision:
Optional Administration web interface (biomaj-watcher)
CLI management
Mail alerts for the update cycle supervision
Prometheus and Influxdb optional integration
Optional consul supervision of processes
Scalability:
Monolithic (local install) or microservice architecture (remote access to a BioMAJ server)
Microservice installation allows per process scalability and supervision (number of process in charge of download, execution, etc.)
Remote access:
Optional FTP server providing authenticated or anonymous data access

Dependencies

Packages:

Debian: libcurl-dev, gcc
CentOs: libcurl-devel, openldap-devel, gcc

Linux tools: tar, unzip, gunzip, bunzip

Database:

mongodb (local or remote)

Indexing (optional):

elasticsearch (global property, use_elastic=1)

ElasticSearch indexing adds advanced search features to biomaj to find bank having files with specific format or type. Configuration of ElasticSearch is not in the scope of BioMAJ documentation. For a basic installation, one instance of ElasticSearch is enough (low volume of data), in such a case, the ElasticSearch configuration file should be modified accordingly:

node.name: "biomaj" (or any other name)
index.number_of_shards: 1
index.number_of_replicas: 0

Installation

From source:

After dependencies installation, go in BioMAJ source directory:

python setup.py install

From packages:

pip install biomaj biomaj-cli biomaj-daemon

You should consider using a Python virtual environment (virtualenv) to install BioMAJ.

In tools/examples, copy the global.properties and update it to match your local installation.

The tools/process contains example process files (python and shell).

Docker

You can use BioMAJ with Docker (genouest/biomaj)

docker pull genouest/biomaj
docker pull mongo
docker run --name biomaj-mongodb -d mongo
# Wait ~10 seconds for mongo to initialize
# Create a local directory where databases will be permanently stored
# *local_path*
docker run --rm -v local_path:/var/lib/biomaj --link biomaj-mongodb:biomaj-mongodb osallou/biomaj-docker --help

Copy your bank properties in directory local_path/conf and post-processes (if any) in local_path/process

You can override global.properties in /etc/biomaj/global.properties (-v xx/global.properties:/etc/biomaj/global.properties)

No default bank property file or process are available in the container.

Examples are available at https://github.com/genouest/biomaj-data

API documentation

https://readthedocs.org/projects/biomaj/

Status

Testing

Execute unit tests

nosetests

Execute unit tests but disable ones needing network access

nosetests -a '!network'

Monitoring

InfluxDB can be used to monitor biomaj. Following series are available:

biomaj.banks.quantity (number of banks)
biomaj.production.size.total (size of all production directories)
biomaj.workflow.duration (workflow duration)
biomaj.production.size.latest (size of latest update)
biomaj.bank.update.downloaded_files (number of downloaded files)
biomaj.bank.update.new (track updates)

License

A-GPL v3+

Remarks

Biomaj uses libcurl, for sftp libcurl must be compiled with sftp support

To delete elasticsearch index:

curl -XDELETE 'http://localhost:9200/biomaj_test/'

Credits

Special thanks for tuco at Pasteur Institute for the intensive testing and new ideas. Thanks to the old BioMAJ team for the work they have done.

BioMAJ is developped at IRISA research institute.

Name		Name	Last commit message	Last commit date
Latest commit History 722 Commits
biomaj		biomaj
docs		docs
scripts		scripts
tests		tests
tools		tools
.coveragerc		.coveragerc
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGES.txt		CHANGES.txt
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
config.yml		config.yml
global.properties.example		global.properties.example
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioMAJ3

Getting started

Migration

Application Features

Dependencies

Installation

Docker

API documentation

Status

Testing

Monitoring

License

Remarks

Credits

About

Releases

Packages

Languages

License

mboudet/biomaj

Folders and files

Latest commit

History

Repository files navigation

BioMAJ3

Getting started

Migration

Application Features

Dependencies

Installation

Docker

API documentation

Status

Testing

Monitoring

License

Remarks

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages