-
Notifications
You must be signed in to change notification settings - Fork 959
Closed
Labels
documentationError in documentationError in documentation
Description
Thank you Satija Lab for the incredible work that has been done with Seurat! You guys are miracle workers!
Here is the bug:
I am using the pbm3k dataset from SeuratData as the reproducible example.
I think there is a bug when using the graph.name
argument. Please correct me if I'm wrong.
Simply using the graph.name
argument generates clustering that is different when not using the graph.name
argument. (either more or less clusters, more or less cells in clusters)
Similar clustering differences happen when using the default algorithm for FindClusters
as well. Here I used algorithm 4.
I thought specifying a graph.name
should keep everything consistent. I actually prefer the clustering generated with the graph.name
argument included in my dataset. I'm wondering what is happening.
Here is what was done:
library(dplyr)
library(Seurat)
library(patchwork)
# Load the PBMC dataset
SeuratData::InstallData("pbmc3k")
pbmc <- SeuratData::LoadData("pbmc3k")
# Initialize the Seurat object with the raw (non-normalized data).
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
#Without graph.name argument
#pbmc <- FindNeighbors(pbmc, graph.name ="test", dims = 1:10)
#pbmc <- FindClusters(pbmc, graph.name = "test", algorithm = 4, resolution = 0.5)
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, algorithm = 4, resolution = 0.5)
pbmc <- RunUMAP(pbmc, dims = 1:10)
DimPlot(pbmc, reduction = "umap", label = TRUE, label.size = 6)
Here is the full workflow for with the graph.name
argument:
> pbmc <- SeuratData::LoadData("pbmc3k")
> # Initialize the Seurat object with the raw (non-normalized data).
> pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
> pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
> pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> all.genes <- rownames(pbmc)
> pbmc <- ScaleData(pbmc, features = all.genes)
Centering and scaling data matrix
|============================================================================================| 100%
> pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
PC_ 1
Positive: CST3, TYROBP, LST1, AIF1, FTL, FTH1, LYZ, FCN1, S100A9, TYMP
FCER1G, CFD, LGALS1, S100A8, CTSS, LGALS2, SERPINA1, IFITM3, SPI1, CFP
PSAP, IFI30, SAT1, COTL1, S100A11, NPC2, GRN, LGALS3, GSTP1, PYCARD
Negative: MALAT1, LTB, IL32, IL7R, CD2, B2M, ACAP1, CD27, STK17A, CTSW
CD247, GIMAP5, AQP3, CCL5, SELL, TRAF3IP3, GZMA, MAL, CST7, ITM2A
MYC, GIMAP7, HOPX, BEX2, LDLRAP1, GZMK, ETS1, ZAP70, TNFAIP8, RIC3
PC_ 2
Positive: CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1, HLA-DRA, LINC00926, CD79B, HLA-DRB1, CD74
HLA-DMA, HLA-DPB1, HLA-DQA2, CD37, HLA-DRB5, HLA-DMB, HLA-DPA1, FCRLA, HVCN1, LTB
BLNK, P2RX5, IGLL5, IRF8, SWAP70, ARHGAP24, FCGR2B, SMIM14, PPP1R14A, C16orf74
Negative: NKG7, PRF1, CST7, GZMB, GZMA, FGFBP2, CTSW, GNLY, B2M, SPON2
CCL4, GZMH, FCGR3A, CCL5, CD247, XCL2, CLIC3, AKR1C3, SRGN, HOPX
TTC38, APMAP, CTSC, S100A4, IGFBP7, ANXA1, ID2, IL32, XCL1, RHOC
PC_ 3
Positive: HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1, HLA-DPA1, CD74, MS4A1, HLA-DRB1, HLA-DRA
HLA-DRB5, HLA-DQA2, TCL1A, LINC00926, HLA-DMB, HLA-DMA, CD37, HVCN1, FCRLA, IRF8
PLAC8, BLNK, MALAT1, SMIM14, PLD4, LAT2, IGLL5, P2RX5, SWAP70, FCGR2B
Negative: PPBP, PF4, SDPR, SPARC, GNG11, NRGN, GP9, RGS18, TUBB1, CLU
HIST1H2AC, AP001189.4, ITGA2B, CD9, TMEM40, PTCRA, CA2, ACRBP, MMD, TREML1
NGFRAP1, F13A1, SEPT5, RUFY1, TSC22D1, MPP1, CMTM5, RP11-367G6.3, MYL9, GP1BA
PC_ 4
Positive: HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1, CD74, HLA-DPB1, HIST1H2AC, PF4, TCL1A
SDPR, HLA-DPA1, HLA-DRB1, HLA-DQA2, HLA-DRA, PPBP, LINC00926, GNG11, HLA-DRB5, SPARC
GP9, AP001189.4, CA2, PTCRA, CD9, NRGN, RGS18, GZMB, CLU, TUBB1
Negative: VIM, IL7R, S100A6, IL32, S100A8, S100A4, GIMAP7, S100A10, S100A9, MAL
AQP3, CD2, CD14, FYB, LGALS2, GIMAP4, ANXA1, CD27, FCN1, RBP7
LYZ, S100A11, GIMAP5, MS4A6A, S100A12, FOLR3, TRABD2A, AIF1, IL8, IFI6
PC_ 5
Positive: GZMB, NKG7, S100A8, FGFBP2, GNLY, CCL4, CST7, PRF1, GZMA, SPON2
GZMH, S100A9, LGALS2, CCL3, CTSW, XCL2, CD14, CLIC3, S100A12, CCL5
RBP7, MS4A6A, GSTP1, FOLR3, IGFBP7, TYROBP, TTC38, AKR1C3, XCL1, HOPX
Negative: LTB, IL7R, CKB, VIM, MS4A7, AQP3, CYTIP, RP11-290F20.3, SIGLEC10, HMOX1
PTGES3, LILRB2, MAL, CD27, HN1, CD2, GDI2, ANXA5, CORO1B, TUBA1B
FAM110A, ATP1A1, TRADD, PPA1, CCDC109B, ABRACL, CTD-2006K23.1, WARS, VMO1, FYB
> pbmc <- FindNeighbors(pbmc, graph.name ="test", dims = 1:10)
Computing nearest neighbor graph
Computing SNN
> pbmc <- FindClusters(pbmc, graph.name = "test", algorithm = 4, resolution = 0.5)
> pbmc <- RunUMAP(pbmc, dims = 1:10)
15:37:13 UMAP embedding parameters a = 0.9922 b = 1.112
15:37:13 Read 2638 rows and found 10 numeric columns
15:37:13 Using Annoy for neighbor search, n_neighbors = 30
15:37:13 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:37:14 Writing NN index file to temp file /tmp/Rtmp66O0pD/filee2d579073b5
15:37:14 Searching Annoy index using 1 thread, search_k = 3000
15:37:15 Annoy recall = 100%
15:37:15 Commencing smooth kNN distance calibration using 1 thread
15:37:16 Initializing from normalized Laplacian + noise
15:37:16 Commencing optimization for 500 epochs, with 105124 positive edges
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:37:21 Optimization finished
> DimPlot(pbmc, reduction = "umap", label = TRUE, label.size = 6)
Here is the full workflow **without** `graph.name` argument:
> pbmc <- SeuratData::LoadData("pbmc3k")
> # Initialize the Seurat object with the raw (non-normalized data).
> pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
> pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
> pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> all.genes <- rownames(pbmc)
> pbmc <- ScaleData(pbmc, features = all.genes)
Centering and scaling data matrix
|============================================================================================| 100%
> pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
PC_ 1
Positive: CST3, TYROBP, LST1, AIF1, FTL, FTH1, LYZ, FCN1, S100A9, TYMP
FCER1G, CFD, LGALS1, S100A8, CTSS, LGALS2, SERPINA1, IFITM3, SPI1, CFP
PSAP, IFI30, SAT1, COTL1, S100A11, NPC2, GRN, LGALS3, GSTP1, PYCARD
Negative: MALAT1, LTB, IL32, IL7R, CD2, B2M, ACAP1, CD27, STK17A, CTSW
CD247, GIMAP5, AQP3, CCL5, SELL, TRAF3IP3, GZMA, MAL, CST7, ITM2A
MYC, GIMAP7, HOPX, BEX2, LDLRAP1, GZMK, ETS1, ZAP70, TNFAIP8, RIC3
PC_ 2
Positive: CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1, HLA-DRA, LINC00926, CD79B, HLA-DRB1, CD74
HLA-DMA, HLA-DPB1, HLA-DQA2, CD37, HLA-DRB5, HLA-DMB, HLA-DPA1, FCRLA, HVCN1, LTB
BLNK, P2RX5, IGLL5, IRF8, SWAP70, ARHGAP24, FCGR2B, SMIM14, PPP1R14A, C16orf74
Negative: NKG7, PRF1, CST7, GZMB, GZMA, FGFBP2, CTSW, GNLY, B2M, SPON2
CCL4, GZMH, FCGR3A, CCL5, CD247, XCL2, CLIC3, AKR1C3, SRGN, HOPX
TTC38, APMAP, CTSC, S100A4, IGFBP7, ANXA1, ID2, IL32, XCL1, RHOC
PC_ 3
Positive: HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1, HLA-DPA1, CD74, MS4A1, HLA-DRB1, HLA-DRA
HLA-DRB5, HLA-DQA2, TCL1A, LINC00926, HLA-DMB, HLA-DMA, CD37, HVCN1, FCRLA, IRF8
PLAC8, BLNK, MALAT1, SMIM14, PLD4, LAT2, IGLL5, P2RX5, SWAP70, FCGR2B
Negative: PPBP, PF4, SDPR, SPARC, GNG11, NRGN, GP9, RGS18, TUBB1, CLU
HIST1H2AC, AP001189.4, ITGA2B, CD9, TMEM40, PTCRA, CA2, ACRBP, MMD, TREML1
NGFRAP1, F13A1, SEPT5, RUFY1, TSC22D1, MPP1, CMTM5, RP11-367G6.3, MYL9, GP1BA
PC_ 4
Positive: HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1, CD74, HLA-DPB1, HIST1H2AC, PF4, TCL1A
SDPR, HLA-DPA1, HLA-DRB1, HLA-DQA2, HLA-DRA, PPBP, LINC00926, GNG11, HLA-DRB5, SPARC
GP9, AP001189.4, CA2, PTCRA, CD9, NRGN, RGS18, GZMB, CLU, TUBB1
Negative: VIM, IL7R, S100A6, IL32, S100A8, S100A4, GIMAP7, S100A10, S100A9, MAL
AQP3, CD2, CD14, FYB, LGALS2, GIMAP4, ANXA1, CD27, FCN1, RBP7
LYZ, S100A11, GIMAP5, MS4A6A, S100A12, FOLR3, TRABD2A, AIF1, IL8, IFI6
PC_ 5
Positive: GZMB, NKG7, S100A8, FGFBP2, GNLY, CCL4, CST7, PRF1, GZMA, SPON2
GZMH, S100A9, LGALS2, CCL3, CTSW, XCL2, CD14, CLIC3, S100A12, CCL5
RBP7, MS4A6A, GSTP1, FOLR3, IGFBP7, TYROBP, TTC38, AKR1C3, XCL1, HOPX
Negative: LTB, IL7R, CKB, VIM, MS4A7, AQP3, CYTIP, RP11-290F20.3, SIGLEC10, HMOX1
PTGES3, LILRB2, MAL, CD27, HN1, CD2, GDI2, ANXA5, CORO1B, TUBA1B
FAM110A, ATP1A1, TRADD, PPA1, CCDC109B, ABRACL, CTD-2006K23.1, WARS, VMO1, FYB
> #Without graph.name argument
> #pbmc <- FindNeighbors(pbmc, graph.name ="test", dims = 1:10)
> #pbmc <- FindClusters(pbmc, graph.name = "test", algorithm = 4, resolution = 0.5)
> pbmc <- FindNeighbors(pbmc, dims = 1:10)
Computing nearest neighbor graph
Computing SNN
> pbmc <- FindClusters(pbmc, algorithm = 4, resolution = 0.5)
> pbmc <- RunUMAP(pbmc, dims = 1:10)
15:38:52 UMAP embedding parameters a = 0.9922 b = 1.112
15:38:52 Read 2638 rows and found 10 numeric columns
15:38:52 Using Annoy for neighbor search, n_neighbors = 30
15:38:52 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:38:53 Writing NN index file to temp file /tmp/Rtmp66O0pD/filee2d55ca00c5
15:38:53 Searching Annoy index using 1 thread, search_k = 3000
15:38:54 Annoy recall = 100%
15:38:54 Commencing smooth kNN distance calibration using 1 thread
15:38:55 Initializing from normalized Laplacian + noise
15:38:55 Commencing optimization for 500 epochs, with 105124 positive edges
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:39:00 Optimization finished
> DimPlot(pbmc, reduction = "umap", label = TRUE, label.size = 6)
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] pbmc3k.SeuratData_3.1.4 patchwork_1.1.1 SeuratObject_4.0.0 Seurat_4.0.0
[5] dplyr_1.0.4
loaded via a namespace (and not attached):
[1] nlme_3.1-152 matrixStats_0.58.0 RcppAnnoy_0.0.18 RColorBrewer_1.1-2
[5] httr_1.4.2 sctransform_0.3.2 tools_4.0.4 utf8_1.1.4
[9] R6_2.5.0 irlba_2.3.3 rpart_4.1-15 KernSmooth_2.23-18
[13] uwot_0.1.10 mgcv_1.8-33 DBI_1.1.1 lazyeval_0.2.2
[17] colorspace_2.0-0 withr_2.4.1 tidyselect_1.1.0 gridExtra_2.3
[21] compiler_4.0.4 cli_2.3.1 plotly_4.9.3 labeling_0.4.2
[25] scales_1.1.1 lmtest_0.9-38 spatstat.data_2.0-0 ggridges_0.5.3
[29] pbapply_1.4-3 rappdirs_0.3.3 spatstat_1.64-1 goftest_1.2-2
[33] stringr_1.4.0 digest_0.6.27 spatstat.utils_2.0-0 pkgconfig_2.0.3
[37] htmltools_0.5.1.1 parallelly_1.23.0 fastmap_1.1.0 htmlwidgets_1.5.3
[41] rlang_0.4.10 rstudioapi_0.13 shiny_1.6.0 farver_2.0.3
[45] generics_0.1.0 zoo_1.8-8 jsonlite_1.7.2 ica_1.0-2
[49] magrittr_2.0.1 Matrix_1.3-2 Rcpp_1.0.6 munsell_0.5.0
[53] fansi_0.4.2 abind_1.4-5 reticulate_1.18 lifecycle_1.0.0
[57] stringi_1.5.3 MASS_7.3-53.1 Rtsne_0.15 plyr_1.8.6
[61] grid_4.0.4 parallel_4.0.4 listenv_0.8.0 promises_1.2.0.1
[65] ggrepel_0.9.1 crayon_1.4.1 miniUI_0.1.1.1 deldir_0.2-10
[69] lattice_0.20-41 cowplot_1.1.1 splines_4.0.4 tensor_1.5
[73] pillar_1.5.0 igraph_1.2.6 future.apply_1.7.0 reshape2_1.4.4
[77] codetools_0.2-18 leiden_0.3.7 glue_1.4.2 data.table_1.14.0
[81] BiocManager_1.30.10 vctrs_0.3.6 png_0.1-7 httpuv_1.5.5
[85] gtable_0.3.0 RANN_2.6.1 purrr_0.3.4 polyclip_1.10-0
[89] tidyr_1.1.2 SeuratData_0.2.1 scattermore_0.7 future_1.21.0
[93] assertthat_0.2.1 ggplot2_3.3.3 xfun_0.21 mime_0.10
[97] xtable_1.8-4 RSpectra_0.16-0 later_1.1.0.1 survival_3.2-7
[101] viridisLite_0.3.0 tibble_3.1.0 tinytex_0.29 cluster_2.1.1
[105] globals_0.14.0 fitdistrplus_1.1-3 ellipsis_0.3.1 ROCR_1.0-11
Metadata
Metadata
Assignees
Labels
documentationError in documentationError in documentation