___ _ ___ _
| _`\ _ ( ) /'___)_ (_ ) _
| (_) )(_)| |_ _ | (__ (_) | | (_) _
| , / | || '_`\ /'_`\ | ,__)| | | | | | /'_`\
| |\ \ | || |_) )( (_) )| | | | | | | |( (_) )
(_) (_)(_)(_,__/'`\___/'(_) (_)(___)(_)`\___/'
Ribofilio is a tool to estimate ribosomes drop-off rate that has been tested in ecoli and yeast so far.
git clone https://github.com/SherineAwad/ribofilio.git
cd src
screed
numpy
matplotlib
sklearn
scipy
To run ribofilio:
python ribofilio.py --transcripts --footprint footprint.bed --rnaseq rnaseq.bed --binsize binsize --output output
or simply:
python ribofilio.py -t transcripts.fa -f footprint.bed -r rnaseq.bed -b binsize -o output
--transcripts or -t for transcripts in fasta format (required)
--footprint or -f for footprint bed file (required)
--rnaseq or -r for rnaseq bed file (if not available, dropoff rate won't be normalized by mRNA)
--subset or -s is a list of genes in file to run ribofilio on this subset only
--binsize or -b for binsize (default: 50)
--pvalue or -v choose 1 for one-sided pvalue or 2 for two-sided pvalue (default: 1)
--output or -o for output name
--plot or -p choose 1 for turning on plot mode and 0 to disable plots (default: 1)
--ylogmin is the minimum y axis for log plots (default: -3)
--ylogmax is the maximum y axis for log plots (default: 2)
Running ribofilio on all gene:
python ribofilio.py -t yeast.fa -f SRR5945809.bed -r SRR5945808.bed
Where yeast.fa is the transcripts, SRR5945809.bed is the bed file of footprints of sample, SRR5945808.bed is the mRNA bed file
here binsize used is 50 as no other binsize is passed.
To run ribofilio on a subset of genes:
python ribofilio.py -t yeast.fa -f SRR5945809.bed -r SRR5945808.bed -s subsetofgenes.txt
Where subsetofgenes.txt is a list of genes:
YDL067C
YGL187C
YGL191W
YHR051W
YIL111W
YLR038C
YLR395C
YMR256C
YNL052W
6 columns bed file format is required, a sample is as follows:
YGL135W_mRNA 95 125 SRR5090936.1.1 1 +
YKL009W_mRNA 70 98 SRR5090936.1.5 42 +
YNL045W_mRNA 1664 1692 SRR5090936.1.11 40 +
YNR050C_mRNA 32 62 SRR5090936.1.18 42 +
YLR159C-A_mRNA 26 54 SRR5090936.1.20 0 -
YHR133C_mRNA 443 473 SRR5090936.1.23 42 +
Refer to Ensemble Bed format for more details regarding bed file formats.
Dropoff rate, dropoff rate per codon along with standard error, 95% confidence interval, root mean squared error (RMSE), R2 error, a t-test score of comparing the slope to a slope of zero and the corresponding pvalue will be print to both screen and output.regression.log file, and number of bins. If the plot mode is on, a weighted linear regression plot will be saved into output.Log.WLR.png.
A sample of output.regression.log:
Dropoff Dropoff per codon RMSE Rsquare SE Margin Error tscore pvalue No.of Bins
-0.0051 -0.0003 0.0143 0.4907 0.0006 0.0011 -9.089848129268738 0.0 295
and output.Log.WLR.png:
Looking for a more detailed tutorial: Take a look into Our Documentation for a complete Ribosomal profiling protocol using Ribofilio.
For the sake of repoducibility of our results, to run our pipeline on our data in the data directory use:
snakemake --cores --use-conda
This require snakemake and conda installed in your system.
The data used to test our tool and used in the pipeline can be downloaded using a Makefile in the data directory:
make yeast.fa
make SRR5090937.fastq
make SRR9670823.fastq
make clean
We provide sample Bed Files prepared for the sake of trying out ribofilio without the need to go through the upstream pipeline.