This repository contains materials for Multi-omics data integration hackathon held during #NGSprint2021, #NGSchool2022: Machine Learning in Computational Biology, and #NGSchool2023: Advances in Computational Biology.
First clone or download and unpack this repository.
To run the tutorial materials you will need:
- The Jupyter notebook with the R language support:
- Jupyter Notebook (installation information)
- R from CRAN-R (download from here)
- IRKernel which enables the usage of R from the notebooks (see tutorial here)
-
R packages (see the instructions in the set_up.R file)
-
python3.6+ (download from here)
-
SUMO: the subtyping tool for multi-omic data (installation information)
To run the tutorial materials in docker environment you will need to:
-
Install Docker Engine (https://docs.docker.com/engine/install/ubuntu)
-
Get the image. Important: '~/ngs22' path (the first part of -v argument) should be changed into the path to the multi-omics-hackathon directory you cloned from this repository:
- online:
docker run --rm -d -p 8585:8888 -e JUPYTER_TOKEN=ngs22 -v ~/ngs22:/opt/app/data/ --name ngs22_reticulate ngschool/ngs22_reticulate:clustering
- local:
docker run --rm -d -p 8585:8888 -e JUPYTER_TOKEN=ngs22 -v ~/ngs22:/opt/app/data/ --name ngs22_reticulate docker.ngschool.eu/ngs22_reticulate:clustering
-
JupyterLab runs on localhost:8585
A significant portion of included materials was created based on very informative "Multi-omics Analysis" chapter by Jonathan Ronen from the "Computational Genomics with R" book available here under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The Acute Myleoid Leukemia (AML) is availble here. The data was pre-processed and made available as a part of following paper: Rappoport, N., & Shamir, R. (2018). Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Research, 46(20), 10546–10562. https://doi.org/10.1093/nar/gky889
SUMO package documentation detailing example usage is available here