forked from hemberg-lab/scRNA.seq.course
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path32-search.Rmd
110 lines (85 loc) · 4.22 KB
/
32-search.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
output: html_document
---
## Search scRNA-Seq data
```{r echo=FALSE, message=FALSE, warning=FALSE, paged.print=FALSE}
library(knitr)
opts_chunk$set(fig.align="center")
```
```{r, echo=TRUE, message=FALSE, warning=FALSE}
library(scfind)
library(SingleCellExperiment)
set.seed(1234567)
```
### About
`scfind` is a tool that allows one to search single cell RNA-Seq collections (Atlas) using lists of genes, e.g. searching for cells and cell-types where a specific set of genes are expressed. `scfind` is a [Bioconductor package](http://bioconductor.org/packages/scfind). Cloud implementation of `scfind` with a large collection of datasets is available on our [website](http://www.hemberg-lab.cloud/scfind).
```{r, echo = FALSE, out.width = '80%', fig.cap="scfind can be used to search large collection of scRNA-seq data by a list of gene IDs."}
knitr::include_graphics("figures/scfind.png")
```
### Dataset
We will run `scfind` on the same human pancreas dataset as in the previous chapter. `scfind` also operates on `SingleCellExperiment` class:
```{r, message=FALSE, warning=FALSE}
muraro <- readRDS("pancreas/muraro.rds")
```
### Gene Index
Now we need to create a gene index using our dataset:
```{r}
cellIndex <- buildCellIndex(muraro)
```
The gene index contains for each gene indexes of the cells where it is expressed. This is similar to sparsification of the expression matrix. In addition to this the index is also compressed in a way that it can accessed very quickly. We estimated that one can achieve 5-10 compression factor with this method.
By default the `cell_type1` column of the `colData` slot of the `SingleCellExperiment` object is used to define cell types, however it can also be defined manually using the `cell_type_column` argument of the `buildCellTypeIndex` function (check `?buildCellTypeIndex`).
### Marker genes
Now let's define lists of cell type specific marker genes. We will use the marker genes identified in the original publication, namely in Figure 1:
```{r}
# these genes are taken from fig. 1
muraro_alpha <- c("GCG", "LOXL4", "PLCE1", "IRX2", "GC", "KLHL41",
"CRYBA2", "TTR", "TM4SF4", "RGS4")
muraro_beta <- c("INS", "IAPP", "MAFA", "NPTX2", "DLK1", "ADCYAP1",
"PFKFB2", "PDX1", "TGFBR3", "SYT13")
muraro_gamma <- c("PPY", "SERTM1", "CARTPT", "SLITRK6", "ETV1",
"THSD7A", "AQP3", "ENTPD2", "PTGFR", "CHN2")
muraro_delta <- c("SST", "PRG4", "LEPR", "RBP4", "BCHE", "HHEX",
"FRZB", "PCSK1", "RGS2", "GABRG2")
```
### Search cells by a gene list
`findCell` function returns a list of p-values corresponding to all cell types in a given dataset. It also outputs a list of cells in which genes from the given gene list are co-expressed. We will run it on all lists of marker genes defined above:
```{r}
res <- findCell(cellIndex, muraro_alpha)
barplot(-log10(res$p_values), ylab = "-log10(pval)", las = 2)
head(res$common_exprs_cells)
```
__Exercise 1__
Perform a search by _beta_, _delta_ and _gamma_ gene lists and explore the results.
```{r, echo=FALSE}
res <- findCell(cellIndex, muraro_beta)
barplot(-log10(res$p_values), ylab = "-log10(pval)", las = 2)
head(res$common_exprs_cells)
res <- findCell(cellIndex, muraro_delta)
barplot(-log10(res$p_values), ylab = "-log10(pval)", las = 2)
head(res$common_exprs_cells)
res <- findCell(cellIndex, muraro_gamma)
barplot(-log10(res$p_values), ylab = "-log10(pval)", las = 2)
head(res$common_exprs_cells)
```
__Exercise 2__
Load the `segerstolpe` and search it using _alpha_, _beta_, _delta_ and _gamma_ gene lists identified in `muraro` dataset.
```{r, echo=FALSE}
segerstolpe <- readRDS("pancreas/segerstolpe.rds")
cellIndex <- buildCellIndex(segerstolpe)
res <- findCell(cellIndex, muraro_alpha)
barplot(-log10(res$p_values), ylab = "-log10(pval)", las = 2)
head(res$common_exprs_cells)
res <- findCell(cellIndex, muraro_beta)
barplot(-log10(res$p_values), ylab = "-log10(pval)", las = 2)
head(res$common_exprs_cells)
res <- findCell(cellIndex, muraro_delta)
barplot(-log10(res$p_values), ylab = "-log10(pval)", las = 2)
head(res$common_exprs_cells)
res <- findCell(cellIndex, muraro_gamma)
barplot(-log10(res$p_values), ylab = "-log10(pval)", las = 2)
head(res$common_exprs_cells)
```
### sessionInfo()
```{r}
sessionInfo()
```