DOMINO: Discovery of Modules In Networks using Omic.
DOMINO is an active module identification (AMI) algorithm. It recieves a gene network and nodes' activity scores as input and report sub-networks (modules) that are putatively biologically meaningful in the context of the activity data.
In extensive evaluation conducted on gene expression and genome-wide association study data we discovered that AMI algorithms tended to over-reporting of enrichment: GO terms enriched in the modules on real data were often also enriched when the algorithms were run on randomly permuted activity scores.
In constrast, modules retrieved by DOMINO had high rate of empirically validated GO terms.
The study is available at https://www.embopress.org/doi/full/10.15252/msb.20209593.
- Requirements
- Installation
- Input File Formats
- Basic Usage
- Advanced usage
- Main output files
- Example files
DOMINO was tested under the following settings:
- Python 3.8 (Note that for further versions of python some dependency packages are currently not available via pip)
- Linux OS (Ubuntu 14.04 LTS, Ubuntu 18.04.4 LTS)
We recommend using a virtual environment. For example:
python3 -m venv domino-env
source domino-env/bin/activate
Then, install domino via pip:
pip install domino-python
Make sure the Bioconda repository and its dependencies are available:
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
Create a virtual environment in conda. For example:
conda create --name domino-env
conda activate domino-env
Then, install domino via pip:
conda install domino
Download the source files and install according to the following:
Clone the repo from Github:
git clone https://github.com/Shamir-Lab/DOMINO.git
cd DOMINO
DOMINO is written in Python3. The necessary libraries will all be installed by the setup.py
script.
We recommend using a virtual environment. For example:
python3 -m venv domino-env
source domino-env/bin/activate
Then, run setup.py:
python setup.py install
-
A network file should be in a simplified sif format:
- Only single node should appear in the first and last column.
- The First row is headers
-
An active gene file contains the gene ids in Ensemble format, separated by a newline char
-
The slices file format is automatically generated by the
slicer
command.
For examples, see files in "examples" folder
To run preprocessing step 0 (partitioning network using Louvain algorithm):
slicer --network_file </path/to/network.sif> --output_file </path/to/output_file>
-n/--network_file
: A path to network file (sif format). e.g., /path/to/network_file.sif.
-o/--output_file
: A path to the output slices file. e.g., /path/to/output/slices_file.txt,
To run DOMINO:
domino --active_genes_files </path/to/dataset1,/path/to/dataset2...> --network_file </path/to/network.sif> --slices_file <slices_file.txt> --output_folder </path/to/output_folder> [-sth <slices_threshold> -mth <putative_modules_threshold>]
The common command line options are:
-a/--active_genes_files
: Comma delimited list of absolute paths to files, each containing a list of active genes, separated by a new line char (\n). e.g. /path/to/active_genes_files_1,/path/to/active_genes_files_2.
-n/--network_file
: A path to network file (sif format). e.g., /path/to/network_file.sif.
-s/--slices_file
: A path to slices file (i.e. the output of "slicer" script). e.g., /path/to/slices_file.txt,
-c/--use_cache
: Use auto-generated cache network files (*.pkl) from previous executions with the same network. NOTE: (1) THIS IS NOT THE SLICES FILE! (2) If the content of the file has changed, you should set this option to "false"
-p/--parallelization
: The number of threads allocated to the run (usually single thread is enough)
-v/--visualization
: Indicates whether a visualization of the modules ought to be generated
-sth/--slices_threshold
: The threshold for considering a slice as relevant
-mth/--module_threshold
: The threshold for considering a putative module as final module.
output_folder/active_gene_file_name/modules.out
: list of final modules
output_folder/active_gene_file_name/module_i.html
: visualization of the i'th module
Example files of networks in simplified sif format and an active gene file are available under "examples" folder