This repository is used for the whole mouse brain (wmb) snATAC-seq data analysis of Center for Epigenomics of the Mouse Brain Atlas (CEMBA), which is now accepted by Nature 2023.
- Demultiplexed data can be accessed via the NEMO archive (NEMO, RRID:SCR_016152) at https://assets.nemoarchive.org/dat-bej4ymm (the raw directory under Source Data URL in this archive).
- We also uploaded our demultiplexed fastq files and processed files under the GEO accession number GSE246791:
- Processed data is also available on our web portal and can be explored here: http://www.catlas.org.
- A Google drive link for bigwig files and SnapATAC2 files just for backup:
- We now have 234 samples and 2.3 million cells in total. So most of the analysis are depend on Snakefile to organize the pipeline and submit them to high-performance cluster (HPC) in order to use hundreds of CPUs at the same time.
- R, Shell and Python (>= 3.10) are mainly used, especially R (>= 4.2).
- Under the directory package, we put lots of common functions there.
- We mainly use SnapATAC2 to analyze the single-nucleus ATAC-seq data
- Comparation between Scrublet and AMULET: https://github.com/yuelaiwang/CEMBA_AMULET_Scrublet
- The deep learning related codes now in the repo: https://github.com/yal054/mba_dl_model
- sa2 is short for SnapATAC2 in this repo.
cembav2env.R: R env to store the metadata during analysis.
Enviorment | Description |
---|---|
cembav2env | meta data of SnapATAC and SnapATAC2 |
cluSumBySa2 | clustering meta data, such as resolution, |
barcode to L4 Ids, L4 major regions and so on | |
Sa2Integration | Integration meta data, like Allen’s data descriptions |
Sa2PeakCalling | Peak calling meta data |
In total, we have implemented four-round iterative clustering. See details in 01.clustering
We use Allen’s scRNAseq data and their annotations for our data annotation. See details in 02.integration
We use macs2 with multiple stage filtering, especially use SPM >= 5 for filtering peaks. See details in 03.peakcalling
In order to better support pbs torque in tscc, please
mkdir -p ~/.config/snakemake
- then
cp -r /projects/ps-renlab2/szu/projects/CEMBA2/package/pbs-torque-conda ~/.config/snakemake/
- then go to the pbs-torque-conda directory you copy,
- modify config.yaml: replace szu to your name and corresponding conda path
- modify submit.yaml: replace szu to your name
- modify pbs-submit.py: replace szu to your name