This repository contains the code used for the Orion methodology (http://bit.ly/2hOI39X).
Gussow AB, Copeland BR, Dhindsa RS, Wang Q, Petrovski S, Majoros WH, Allen AS, Goldstein DB. Orion: Detecting regions of the human non-coding genome that are intolerant to variation using population genetics. PLoS ONE. 2017; 12(8): e0181604.
We recommend using conda to create an environment in which to run Orion. Orion requires Python 2.7.
Orion relies on the luigi framework, with a couple of in-house modifications. Specifically, we added two new parameter types, named InputFileParameter and OutputFileParameter.
After installing the luigi framework, you'll need to add these new parameter types to your luigi installation. Append the contents of src/luigi_parameter_extension.py to the existing parameter.py file within the installed luigi package (luigi/parameter.py):
cat src/luigi_parameter_extension.py >> path/to/luigi/parameter.py
Following this, you'll need to edit the __init__.py file in the installed luigi package so that it loades the new parameter types.
In your installation of luigi, in luigi/__init__.py edit line that begins with:
from luigi.parameter import [...here you'll see a list of luigi parameter types...]
Add the new parameter types (InputFileParameter, OutputFileParameter) at the end of the import list on this line.
How to calculate Orion scores:
src/Orion.py CalculateOrionScores \
--sample-size <sample_size> \
--coverage-file <coverage_file> \
--allele-counts-file <allele_counts_file> \
--window-length <window_length> \
--output-path <output_path:final output file> \
--output-directory <output_directory:temporary output files directory> \
--workers <num_workers>
Orion regions can be calculated using getMCFRegions.R. For more information, run:
Rscript src/getMCFRegions.R --help
Throughout the study we generated three datasets:
- Orion Scores
- Orion regions
- Coordinates of defined Orion scores, non-repeat autosomal regions that were covered in our sample