Skip to content

GuoshuaiCai/scanner

Repository files navigation

SCANNER takes Seurat object, which can be generated by from the Seurat pipleline. This example instruction use a preprocessed Seurat object of data from cells of human bronchial epithelium (Duclos et al. Sci Adv 2019). Sequencing read counts in single cells were downloaded from NIH GEO (GSE131391). Subsequent data analyses, including data normalization, high variable feature selection, data scaling, dimension reduction, and cluster identification were performed using the Seurat 3.0 package.

Packages Seurat, dplyr and SIBER will be used

library("Seurat")
library("dplyr")
library("SIBER")

Three files are required, including a Seurat object file (GSE131391_object.RData), an dataset description file (GSE131391_description.csv) and a phenotype file (GSE131391_series_matrix.txt). There example files can be found in the current folder.

First, load the example Seurat object.

load("GSE131391_object.RData")

Also include dataset meta information into the data object. Users can prepare in the format of meta information same to “GSE131391_description.csv” and apply following steps.

meta.info <- read.csv("GSE131391_description.csv",
                          check.names = F,
                          row.names = 1,
                          na.strings = F,
                          stringsAsFactors = F)
meta.info.l <- as.list(subset(meta.info, select = "Description", drop = T))
names(meta.info.l) <- row.names(meta.info)
seurat.object@misc$meta.info <- meta.info.l

The phenotype data can be obtained by reading the GEO series matrix file or a self-constructed matrix. The information vectors of variables such as age, sex and smoking could be obtained. Note that currently SCANNER only visualize categorical groups and thus we assign subjects into two age groups (“<30”,“>=30”).

sample.info <- GEOquery::getGEO(filename="GSE131391_series_matrix.txt")

age<-sample.info@phenoData@data$'age:ch1'
sex<-sample.info@phenoData@data$'Sex:ch1'
smoking<-sample.info@phenoData@data$'smoking status:ch1'
age.o<-which(as.numeric(age)>=30)
age.y<-which(as.numeric(age)<30)
age[age.o]<-">=30"
age[age.y]<-"<30"

Match and incoporte these phenotype information into the DataSegregation of the Seurat object.

sample.vec<-sample.info@phenoData@data$title
match.loc<-match(gsub("_cell.+","",colnames(seurat.object)),gsub(" ","_",sample.vec))
age.vec<-factor(age[match.loc])
sex.vec<-factor(sex[match.loc])
smoking.vec<-factor(smoking[match.loc])
names(age.vec)<-colnames(seurat.object)
names(sex.vec)<-colnames(seurat.object)
names(smoking.vec)<-colnames(seurat.object)
seurat.object$age <-age.vec
seurat.object$sex <- sex.vec
seurat.object$smoking <- smoking.vec

seurat.object@misc$DataSegregation <- list(
            "age" =levels(seurat.object@meta.data$'age'),
            "sex" =levels(seurat.object@meta.data$'sex'),
            "smoking" =levels(seurat.object@meta.data$'smoking')
            )

Identify differentially expressed genes in each cell type compared to others. Include the top 10, 30 and 100 into misc of the Seurat object.

DE.all <- FindAllMarkers(object = seurat.object,
                            only.pos = FALSE,
                            min.pct = 0.1)
## Calculating cluster club

## Calculating cluster goblet

## Calculating cluster basal

## Calculating cluster ciliated

## Calculating cluster WBC

## Calculating cluster basal-smoking

## Calculating cluster inocyte
seurat.object@misc$DE$top10<-DE.all %>% group_by(cluster) %>% 
                    top_n(10, avg_log2FC) %>% as.data.frame()
seurat.object@misc$DE$top30<-DE.all %>% group_by(cluster) %>% 
                    top_n(30, avg_log2FC) %>% as.data.frame()
seurat.object@misc$DE$top100<-DE.all %>% group_by(cluster) %>% 
                    top_n(100, avg_log2FC) %>% as.data.frame()

Optional but recommended Subset data with a smaller number (500 by default) of cells in each cell type. Two methods are available, including (1) “sampling”: randomly select cells and (2) “ellipse”: select core cells by fitting 2D ellipses. This subsetting step is automatically applied with the “sampling” method when loading data into SCANNER. This processing will lose cells but keep representative ones. This step is highly recommended for largely reducing uploading and analysis time.

source("support_function.R")
## 
## Attaching package: 'future'

## The following object is masked from 'package:rmarkdown':
## 
##     run
seurat.object.subset<-subset_data(seurat.object,ncell=100,method="sampling")

Further, store the ranks of gene expression in each cell.

seurat.object.subset@assays$RNA@misc$rank<-
    apply(seurat.object.subset@assays$RNA@data,2,function(x) rank(x))

Last, save the data into a single R object (RDS format). It is ready to load into SCANNER for visualization and analysis!

saveRDS(seurat.object.subset, "GSE131391_sampling.RDS")

Extra We provide the functions for gene set activity analysis used by SCANNER. They can be directly applied to the Seurat object. The gene set activity in each cell can be inferred by four ways, including (1) “Exp”: the average of gene expression, (2) “Rnk”: the average of Ranks with lowest expression rank=1, (3) “EigGen”: the value of eigen gene and (4) “ES”: the single sample gene enrichment score. The average activity in each cluster (cell type) will be provided as well. Group comparison can be easily achieved by letting gs.activity know which group variable you want to look at. In this example, we looked at the KEGG Renin-angiotensin system pathway.

source("support_function.R")
ras<-c("ACE","ACE2","AGT","AGTR1","AGTR2","ANPEP","CMA1",
    "CPA3","CTSA","CTSG","ENPEP","LNPEP","MAS1","MME","NLN","REN","THOP1")
act.es<-gs.activity(seurat.object,ras,method="ES")
act.es$cluster
##                   score
## basal         0.5335443
## goblet        0.5296804
## ciliated      0.5304156
## basal-smoking 0.5324925
## WBC           0.5370387
## club          0.5299578
## inocyte       0.5300311
act.es.smoking<-gs.activity(seurat.object,ras,condition="smoking",method="ES")
act.es.smoking$cluster
##                                  score
## basal_Current Smoker         0.5371362
## goblet_Current Smoker        0.5294119
## ciliated_Current Smoker      0.5284983
## basal-smoking_Current Smoker 0.5328937
## WBC_Current Smoker           0.5362448
## club_Current Smoker          0.5291890
## inocyte_Current Smoker       0.5344121
## basal_Never Smoker           0.5318248
## ciliated_Never Smoker        0.5321151
## club_Never Smoker            0.5299792
## goblet_Never Smoker          0.5381845
## WBC_Never Smoker             0.5376130
## basal-smoking_Never Smoker   0.5188524
## inocyte_Never Smoker         0.5256502

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages