This tool serves as a preproccessing pipeline from raw miRNAseq fastq files to the RBioMIR suite of analysis tools producing analysis of known miRNA in publication ready figures.
Clone the repository and use a python3 virtualenv to install python requirements.
>>> git clone https://github.com/liamhawkins/mirna-pipeline.git
>>> cd mirna-pipeline
>>> virtualenv venv -p $(which python3)
>>> source venv/bin/activate
>>> pip install -r requirements.txt
This pipeline also requires the following programs to be installed on your system:
Program | Version Tested |
---|---|
fastqc |
0.10.1 |
fastq-mcf |
1.05 |
cutadapt |
1.17 |
bowtie-build |
1.0.0 |
bowtie |
1.0.0 |
samtools |
1.3.1 |
Rscript |
3.5.2 |
Create a config file (See example_config.ini
for exact template) for each set of
analysis you wish to process.
The pipeline can then be run from the command line:
>>> pypipeline.py --config example_config.ini
Multiple config files defining multiple analysis can be run in sequence by supplying a directory containing config *.ini files:
>>> pypipeline.py --config-dir dir_containing_configs/
In this case it is useful to suppress user prompts with the --no-prompts
flag:
>>> pypipeline.py --no-prompts --config-dir dir_containing_configs/
If read counts are already available, you can perform the R analysis only using the
--analysis-only
flag:
>>> pypipeline.py --config example_config.ini --analysis-only dir_with_readcounts/
Readcount file names need to be in the following format:
<sample_name_from_config>_MATURE.read_count.txt
A full list of command line options can be found using the help flag:
>>> pypipeline.py --help
usage: pypipeline.py [-h] [-c <config_file> | -d <config_dir>] [--no-prompts]
[--no-fastqc] [--delete]
[--no-analysis | --analysis-only <read_count_dir>]
optional arguments:
-h, --help show this help message and exit
-c <config_file>, --config <config_file>
Path to config file
-d <config_dir>, --config-dir <config_dir>
Directory containing config files
--no-prompts Suppress user prompts
--no-fastqc Do not perform fastqc on raw files
--delete Delete intermediate processing files
--no-analysis Do not perform R analysis
--analysis-only <read_count_dir>
Run analysis only on read counts in supplied directory