-
Notifications
You must be signed in to change notification settings - Fork 42
3. Quick start
The scCancer
is mainly designed for 10X Genomics platform,
and it requires a data folder containing the results generated by the software
Cell Ranger
.
In general, the data folder needs to be organized as following which is the output of Cell Ranger V3
:
/sampleFolder
├── filtered_feature_bc_matrix
│ ├── barcodes.tsv.gz
│ ├── features.tsv.gz
│ └── matrix.mtx.gz
├── raw_feature_bc_matrix
│ ├── barcodes.tsv.gz
│ ├── features.tsv.gz
│ └── matrix.mtx.gz
└── web_summary.html
Comparing to Cell Ranger V2 (CR2)
, Cell Ranger V3 (CR3)
can identify cells with
low RNA content better. So we suggest to use CR3
to do alignment and cell-calling.
Considering that some published data is from CR2
or the raw matrix isn't supported, we
specially deisgn the pipeline to be compatible with these situations.
A common folder structure of CR2
is as below.
/sampleFolder
├── filtered_gene_bc_matrices
│ └── hg19
│ ├── barcodes.tsv
│ ├── genes.tsv
│ └── matrix.mtx
├── raw_gene_bc_matrices
│ └── hg19
│ ├── barcodes.tsv
│ ├── genes.tsv
│ └── matrix.mtx
└── web_summary.html
For other droplet-based platforms, the data folder should be prepared likewise.
Here, we provide an example data of
kidney cancer from 10X Genomics. Users can download it and run following scripts
to understand the workflow of scCancer
. And following are the generated HTML reports:
For multi-samples, following is a generated HTML report for three kidney cancer samples integration analysis:
The scStatistics
mainly implements quality control for the expression matrix
and returns some suggested thresholds to filter cells and genes.
Meanwhile, to evaluate the influence of ambient RNAs from lysed cells better,
this step also estimates the contamination fraction by using the algorithm of SoupX
.
Following is the example script to run the first module scStatistics
.
And using help(runScStatistics)
can get more details about its arguments to realize personalized setting.
library(scCancer)
dataPath <- "./data/KC-example" # The path of cell ranger processed data
savePath <- "./results/KC-example" # A path to save the results
sampleName <- "KC-example" # The sample name
authorName <- "G-Lab@THU" # The author name to mark the report
# Run scStatistics
stat.results <- runScStatistics(
dataPath = dataPath,
savePath = savePath,
sampleName = sampleName,
authorName = authorName
)
Running the scStatistics
script will generate some files/folders as below:
- report-scStat.html : A HTML report containing all results.
- report-scStat.md : A markdown report.
- figures/ : All figures generated during this module.
- report-figures/ : All figures presented in the HTML report.
- cellManifest-all.txt : The statistical results for all droplets.
- cell.QC.thres.txt : The suggested thresholds to filter poor-quality cells.
- geneManifest.txt : The statistical results for genes.
- ambientRNA-SoupX.txt : The results of estimating contamination fraction.
-
report-cellRanger.html : The summary report generated by
Cell Ranger
.
Using the QC thresholds, the scAnnotation
filters cells and genes firstly, and then
performs basic operations (normalization, log-transformation, highly variable genes identification,
unwanted variance removing, scaling, centering, dimension reduction, clustering,
and differential expression analysis) using R package Seurat V3
.
Besides, scAnnotation
also performs some cancer-specific analyses:
-
Doublet score estimation : In this step, we integrate two methods (binary classification based
bcds
and co-expression basedcxds
) of R packagescds
to estimate doublet scores. -
Cancer micro-environmental cell type classification : In this step, we develop a data-driven OCLR (one-class logistic regression) model to predict cell types, including epithelial cells, endothelial cells, fibroblasts, and immune cells (CD4+ T cells, CD8+ T cells, B cells, nature killer cells, and myeloid cells).
-
Cell malignancy estimation : In this step, we refer to the algorithm of R package
infercnv
to estimate an initial CNV profiles. Then, we take advantage of cells’ neighbor information to smooth CNV values and define the malignancy score as the mean of the squares of them. -
Cell cycle analysis : In this step, to analyze intra-tumor cell phenotype heterogeneity, we define cell cycle score as the relative average expression of a list of G2/M and S phase markers, by using the function “AddModuleScore” of
Seurat
. -
Cell stemness analysis : In this step, to analyze intra-tumor cell phenotype heterogeneity, we define cell stemness score as the Spearman correlation coefficient between cells’ expression and our pre-trained stemness signature, by referring to the algorithm of
Malta et al
. -
Gene set signature analysis : In this step, to analyze intra-tumor heterogeneity at gene sets level, we provide two methods to calculated gene set signature scores:
GSVA
and relative average expression levels. By default, we use 50 hallmark gene sets fromMSigDB
. -
Expression programs identification : In this step, to analyze intra-tumor heterogeneity at gene sets level, we use non-negative matrix factorization (NMF) to unsupervisedly identify potential expression program signatures.
-
Cell-cell interaction analyses : In this step, we referred to the methods of
Kumar et al
to characterize ligand-receptor interactions across cell clusters.
Following is the example script to run the second module scAnnotation
.
And using help(runScAnnotation)
can get more details about its arguments to realize personalized setting.
library(scCancer)
dataPath <- "./data/KC-example" # The path of cell ranger processed data
statPath <- "./results/KC-example" # The path of the scStatistics results
savePath <- "./results/KC-example" # A path to save the results
sampleName <- "KC-example" # The sample name
authorName <- "G-Lab@THU" # The author name to mark the report
# Run scAnnotation
anno.results <- runScAnnotation(
dataPath = dataPath,
statPath = statPath,
savePath = savePath,
authorName = authorName,
sampleName = sampleName,
geneSet.method = "average" # or "GSVA"
)
Running the scAnnotation
script will generate some files/folders as below:
- report-scAnno.html : A HTML report containing all results.
- report-scAnno.md : A markdown report.
- figures/ : All figures generated during this module.
- report-figures/ : All figures presented in the HTML report.
- geneManifest.txt : The annotation results of genes updated by filter information.
- expr.RDS : A Seurat object.
- diff.expr.genes/ : Differentially expressed genes information for all clusters.
- cellAnnotation.txt : The annotation results for each cells.
- malignancy/: All results of cell malignancy estimation.
- expr.programs/ : All results of expression programs identification.
- InteractionScore.txt : Cell clusters interactions scores.
The scCombination
mainly performs multiple samples data integration, batch effect correction and analyses visualization based on the scAnnotation
results of single sample. And four strategies (NormalMNN
(default), SeuratMNN
, Raw
and Regression
) to integrate data and correct batch effect are optional.
Following is the example script to run the module scCombination
.
And using help(runScCombination)
can get more details about its arguments.
library(scCancer)
# The paths of all sample's "runScAnnotation" results
single.savePaths <- c("./results/KC1", "./results/KC2", "./results/KC3")
sampleNames <- c("KC1", "KC2", "KC3") # The labels for all samples
savePath <- "./results/KC123-comb" # A path to save the results
combName <- "KC123-comb" # A label of the combined samples
authorName <- "G-Lab@THU" # The author name to mark the report
comb.method <- "NormalMNN" # Integration methods ("NormalMNN", "SeuratMNN", "Harmony", "Raw", "Regression", "LIGER")
# Run scCombination
comb.results <- runScCombination(
single.savePaths = single.savePaths,
sampleNames = sampleNames,
savePath = savePath,
combName = combName,
authorName = authorName,
comb.method = comb.method
)
Running the scCombination
script will generate some files/folders as below:
- report-scAnnoComb.html : A HTML report containing all results.
- report-scAnnoComb.md : A markdown report.
- figures/ : All figures generated during this step.
- report-figures/ : All figures presented in the HTML report.
- expr.RDS : A Seurat object.
- diff.expr.genes/ : Differentially expressed genes information for all clusters.
- cellAnnotation.txt : The annotation results for each cells.
- expr.programs/ : All results of expression programs identification.
- (anchors.RDS : The anchors used for batch correction of "NormalMNN" or "SeuratMNN".)