LIONS is a bioinformatic analysis pipeline which brings together a few pieces of software and some home-brewed scripts to annotate a paired-end RNAseq library against a reference TE annotation set (such as Repeat Masker)
East Lion
scripts processes bam file input, re-aligns it to a genome, builds an ab initio assembly using Tophat2. This assembly is then proccessed and local read searches are done at the 5' ends to find additional transcript start sites and quality control the 5' ends of the assembly. The output is a file-type .lions which annotates the intersection between the assembly, a reference gene set and repeat set.
West Lion
scripts compile different .lions
files, groups them into biological catagories (i.e. Cancer vs. Normal or Treatment vs. Control) and compares and analyzes the data to create graphs and meaningful interpretation of the data.
-
Download the LIONS repo
-
Install the dependencies for
LIONS
-
Initialize the 'Parameter Files' for your system for
LIONS
$LIONS_PATH/controls/<system>.sysctrl
: System-specific variables$LIONS_PATH/controls/<project>.ctrl
: Project-specific variables$LIONS_PATH/controls/<input>.ctrl
: List of RNA-seq file inputs for project
-
Add Reference / Annotation files for
LIONS
-
Populate the resource files: NOTE: UCSC files are downloaded from: UCSC Genome Browser). There is an
example
folder with example of what files should look like.- In
$LIONS_PATH/resources/<genomeName>/genome/
add a .fa genome sequence file - In
$LIONS_PATH/resources/<genomeName>/repeat/
UCSC annotation for RepeatMasker for - (Optional) In
$LIONS_PATH/resources/<genomeName>/annotation/
UCSC annotation for protein-coding genes
- In
-
Run the master
lions.sh
in bash:bash $LIONS_PATH/lions.sh <$LIONS_PATH/controls/parameter.ctrl>
If you have any questions please email me: Artem Babaian. I'll do my best to respond and help get this working!