scNOVA

scNOVA : Single-Cell Nucleosome Occupancy and genetic Variation Analysis - summarized in a single Snakemake pipeline.

Overview of this workflow

PART0. Pre-requirement step - preparation of single-cell genetic information using scTRIP analysis (Sanders et al. 2020)
Mosaicatcher: https://github.com/friendsofstrandseq/mosaicatcher-pipeline

PART1. Read preprocessing

PART2. Read counting to generate single-cell genebody NO

PART3. Infer expressed genes of subclones

PART4. DE analysis of subclones (default and alternative mode)

PART5. (Optional) Infer Single-cell TF motif accessibility using chromVAR (20210108 updated), by default, Roadmap epigenomics DHS (Enhancers) will be used to define CREs

PART6. (Optional) Infer haplotype-resolved genebody NO (20210108 updated)

Main output

Single-cell level NO table : result/{SAMPLE}_sort_geneid.txt
Infer expression probability for each clones : result_CNN/DNN_train80_output_ypred_clone1_annot.txt,
result_CNN/DNN_train80_output_ypred_clone2_annot.txt
Infer differential expression table : result/Result_scNOVA_infer_expression_table.txt,
result/result_PLSDA_{SAMPLE}.txt (20220121 updated)
Heatmap and UMAP visualization of significant hits : result_plots/Result_scNOVA_plots_{SAMPLE}.pdf,
result_plots/Result_scNOVA_plots_{SAMPLE}_alternative_PLSDA.pdf
(Optional) Single-cell level TF motif deviation z-score : result/motif_dev_zscore_chromVAR_DHS_2kb_Enh_{SAMPLE}.txt
(Optional) Haplotype-resolved NO of genebody and CREs : result_haplo/Deeptool_Genebody_H1H2_sort.txt, result_haplo/Deeptool_DHS_2kb_H1H2_sort.txt

System requirements

This workflow is mean to be run in a Unix-based operating system.

Installation

As of now, unix based tools and python packages are expected to be available via modules. R packages are automatically downloaded and used via conda.

unix based tools : SAMtools/1.3.1-foss-2016b, biobambam2/2.0.76-foss-2016b, deeptools/2.5.1-foss-2016b-Python-2.7.12
python packages : cuDNN, CUDA, TensorFlow, scikit-learn, matplotlib
R packages: handled via conda install. You can choose not to use conda, in which case you will need the following packages: DESeq2, matrixStats, pheatmap, gplots, umap, Rtsne, factoextra, pracma, chromVAR, nabor, motifmatchr

Setup

Download this pipeline
- git lfs install
- git clone https://github.com/jeongdo801/scNOVA.git
  - install dependencies (see further below)
Preparation of input files
- Add your single-cell bam and index files (input_bam/*.bam)
- Add key result files from mosaicatcher output in the input_user folder
  - input_user/simpleCalls_llr4_poppriorsTRUE_haplotagsFALSE_gtcutoff0.05_regfactor6_filterTRUE.txt
  - input_user/strandphaser_output.txt
- Add the subclonality information (input_user/input_subclonality.txt)
- Add the genes within copy number changed region to mask in the infer differential expression result, if it's not provided, genes will not be masked. (input_user/input_SV_affected_genes.txt)
Change the project name in the Snakefile
Setup dependencies (--use-conda is recommended as follows)
- snakemake -j 4 --use-conda --conda-create-envs-only
Launch the run_pipeline.sh script

Dependencies and conda

This tool reduces the burden of installing R dependencies and their correct versions by using conda. After following the installation and setup (steps 1-3), use snakemake -j 4 --use-conda --conda-create-envs-only to prepare the conda environment.

Note 1: If you do not have a lot of space in your HOME directory (i.e. ~), make sure conda installs packages into a directory where you have more space. To do so, open ~/.condarc (which already might have content if you have used conda before) and add the following lines to it:

pkgs_dirs:
  - <path/to/directory_with_more_space>/envs/pkgs/
envs_dirs:
  - <path/to/directory_with_more_space>/envs/

Note 2: If you want to install all dependencies manually and not use conda, simply remove --use-conda from run_pipeline.sh, which will make snakemake ignore all conda statements.

Rare conda issue

A note on running hundreds of jobs at the same time (cluster environment):

If running many concurrent jobs, a rare race condition can occur in which two R environments are activated at the same time. During this activation, a file is deleted for a short amount of time, which will throw an error if a separate activation is taking place at the same time. The issue is described here and can be worked around by applying this fix, which has not yet been merged into any branch unfortunately.

The error that gets thrown looks like this: /path/to/pipeline/.snakemake/conda/<hash>/lib/R/bin/R: line 248: /path/to/pipeline/.snakemake/conda/<hash>/lib/R/etc/ldpaths: No such file or directory

Configuration for the special use

By default, to make CNN feature, scNOVA performs library size normalization followed by local copy number normalization. However, for the samples with more dramatic karyotypic changes (e.g. changes in ploidy status), we provide a option to use copy number normalization before normalization by library size. To do so, users can change two lines in the Snake.config.json

{
    ...
    "count_norm" : "utils/count_norm_ploidy.R",
    ...
    "combine_features" : "utils/combine_features_ploidy.R",
    ...
}

References

For detailed information on scNOVA see

Jeong and Grimes et al., 2022 (https://www.nature.com/articles/s41587-022-01551-4)

For information on scTRIP see

Sanders et al., 2020 (doi: https://doi.org/10.1038/s41587-019-0366-x)

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
envs		envs
input_user_example		input_user_example
utils		utils
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
Snake.config.json		Snake.config.json
Snakefile		Snakefile
run_pipeline.sh		run_pipeline.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scNOVA

Overview of this workflow

System requirements

Installation

Setup

Dependencies and conda

Rare conda issue

Configuration for the special use

References

About

Releases

Packages

Contributors 2

Languages

License

jeongdo801/scNOVA

Folders and files

Latest commit

History

Repository files navigation

scNOVA

Overview of this workflow

System requirements

Installation

Setup

Dependencies and conda

Rare conda issue

Configuration for the special use

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages