Using the graph.name argument generates clustering that is different when not using the graph.name argument?

Thank you Satija Lab for the incredible work that has been done with Seurat! You guys are miracle workers!

Here is the bug:

I am using the pbm3k dataset from SeuratData as the reproducible example.

I think there is a bug when using the `graph.name` argument. Please correct me if I'm wrong.

### Simply using the `graph.name` argument generates clustering that is different when not using the `graph.name` argument. (either more or less clusters, more or less cells in clusters) 

Similar clustering differences happen when using the default algorithm for `FindClusters` as well. Here I used algorithm 4.

I thought specifying a `graph.name` should keep everything consistent. I actually prefer the clustering generated with the `graph.name` argument included in my dataset. I'm wondering what is happening. 

Here is what was done:

```
library(dplyr)
library(Seurat)
library(patchwork)

# Load the PBMC dataset
SeuratData::InstallData("pbmc3k")
pbmc <- SeuratData::LoadData("pbmc3k")
# Initialize the Seurat object with the raw (non-normalized data).
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
#Without graph.name argument
#pbmc <- FindNeighbors(pbmc, graph.name ="test", dims = 1:10)
#pbmc <- FindClusters(pbmc, graph.name = "test", algorithm = 4, resolution = 0.5)
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, algorithm = 4, resolution = 0.5)
pbmc <- RunUMAP(pbmc, dims = 1:10)
DimPlot(pbmc, reduction = "umap", label = TRUE, label.size = 6)
```
Here is the full workflow for **with** the `graph.name` argument:
```
> pbmc <- SeuratData::LoadData("pbmc3k")
> # Initialize the Seurat object with the raw (non-normalized data).
> pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
> pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
> pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> all.genes <- rownames(pbmc)
> pbmc <- ScaleData(pbmc, features = all.genes)
Centering and scaling data matrix
  |============================================================================================| 100%
> pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
PC_ 1 
Positive:  CST3, TYROBP, LST1, AIF1, FTL, FTH1, LYZ, FCN1, S100A9, TYMP 
	   FCER1G, CFD, LGALS1, S100A8, CTSS, LGALS2, SERPINA1, IFITM3, SPI1, CFP 
	   PSAP, IFI30, SAT1, COTL1, S100A11, NPC2, GRN, LGALS3, GSTP1, PYCARD 
Negative:  MALAT1, LTB, IL32, IL7R, CD2, B2M, ACAP1, CD27, STK17A, CTSW 
	   CD247, GIMAP5, AQP3, CCL5, SELL, TRAF3IP3, GZMA, MAL, CST7, ITM2A 
	   MYC, GIMAP7, HOPX, BEX2, LDLRAP1, GZMK, ETS1, ZAP70, TNFAIP8, RIC3 
PC_ 2 
Positive:  CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1, HLA-DRA, LINC00926, CD79B, HLA-DRB1, CD74 
	   HLA-DMA, HLA-DPB1, HLA-DQA2, CD37, HLA-DRB5, HLA-DMB, HLA-DPA1, FCRLA, HVCN1, LTB 
	   BLNK, P2RX5, IGLL5, IRF8, SWAP70, ARHGAP24, FCGR2B, SMIM14, PPP1R14A, C16orf74 
Negative:  NKG7, PRF1, CST7, GZMB, GZMA, FGFBP2, CTSW, GNLY, B2M, SPON2 
	   CCL4, GZMH, FCGR3A, CCL5, CD247, XCL2, CLIC3, AKR1C3, SRGN, HOPX 
	   TTC38, APMAP, CTSC, S100A4, IGFBP7, ANXA1, ID2, IL32, XCL1, RHOC 
PC_ 3 
Positive:  HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1, HLA-DPA1, CD74, MS4A1, HLA-DRB1, HLA-DRA 
	   HLA-DRB5, HLA-DQA2, TCL1A, LINC00926, HLA-DMB, HLA-DMA, CD37, HVCN1, FCRLA, IRF8 
	   PLAC8, BLNK, MALAT1, SMIM14, PLD4, LAT2, IGLL5, P2RX5, SWAP70, FCGR2B 
Negative:  PPBP, PF4, SDPR, SPARC, GNG11, NRGN, GP9, RGS18, TUBB1, CLU 
	   HIST1H2AC, AP001189.4, ITGA2B, CD9, TMEM40, PTCRA, CA2, ACRBP, MMD, TREML1 
	   NGFRAP1, F13A1, SEPT5, RUFY1, TSC22D1, MPP1, CMTM5, RP11-367G6.3, MYL9, GP1BA 
PC_ 4 
Positive:  HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1, CD74, HLA-DPB1, HIST1H2AC, PF4, TCL1A 
	   SDPR, HLA-DPA1, HLA-DRB1, HLA-DQA2, HLA-DRA, PPBP, LINC00926, GNG11, HLA-DRB5, SPARC 
	   GP9, AP001189.4, CA2, PTCRA, CD9, NRGN, RGS18, GZMB, CLU, TUBB1 
Negative:  VIM, IL7R, S100A6, IL32, S100A8, S100A4, GIMAP7, S100A10, S100A9, MAL 
	   AQP3, CD2, CD14, FYB, LGALS2, GIMAP4, ANXA1, CD27, FCN1, RBP7 
	   LYZ, S100A11, GIMAP5, MS4A6A, S100A12, FOLR3, TRABD2A, AIF1, IL8, IFI6 
PC_ 5 
Positive:  GZMB, NKG7, S100A8, FGFBP2, GNLY, CCL4, CST7, PRF1, GZMA, SPON2 
	   GZMH, S100A9, LGALS2, CCL3, CTSW, XCL2, CD14, CLIC3, S100A12, CCL5 
	   RBP7, MS4A6A, GSTP1, FOLR3, IGFBP7, TYROBP, TTC38, AKR1C3, XCL1, HOPX 
Negative:  LTB, IL7R, CKB, VIM, MS4A7, AQP3, CYTIP, RP11-290F20.3, SIGLEC10, HMOX1 
	   PTGES3, LILRB2, MAL, CD27, HN1, CD2, GDI2, ANXA5, CORO1B, TUBA1B 
	   FAM110A, ATP1A1, TRADD, PPA1, CCDC109B, ABRACL, CTD-2006K23.1, WARS, VMO1, FYB 
> pbmc <- FindNeighbors(pbmc, graph.name ="test", dims = 1:10)
Computing nearest neighbor graph
Computing SNN
> pbmc <- FindClusters(pbmc, graph.name = "test", algorithm = 4, resolution = 0.5)
> pbmc <- RunUMAP(pbmc, dims = 1:10)
15:37:13 UMAP embedding parameters a = 0.9922 b = 1.112
15:37:13 Read 2638 rows and found 10 numeric columns
15:37:13 Using Annoy for neighbor search, n_neighbors = 30
15:37:13 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:37:14 Writing NN index file to temp file /tmp/Rtmp66O0pD/filee2d579073b5
15:37:14 Searching Annoy index using 1 thread, search_k = 3000
15:37:15 Annoy recall = 100%
15:37:15 Commencing smooth kNN distance calibration using 1 thread
15:37:16 Initializing from normalized Laplacian + noise
15:37:16 Commencing optimization for 500 epochs, with 105124 positive edges
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:37:21 Optimization finished
> DimPlot(pbmc, reduction = "umap", label = TRUE, label.size = 6)
```
![with-graph-name](https://user-images.githubusercontent.com/71743538/109353611-9b19bb80-784a-11eb-84c7-08c6b17be284.png)
```
Here is the full workflow **without** `graph.name` argument:
> pbmc <- SeuratData::LoadData("pbmc3k")
> # Initialize the Seurat object with the raw (non-normalized data).
> pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
> pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
> pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> all.genes <- rownames(pbmc)
> pbmc <- ScaleData(pbmc, features = all.genes)
Centering and scaling data matrix
  |============================================================================================| 100%
> pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
PC_ 1 
Positive:  CST3, TYROBP, LST1, AIF1, FTL, FTH1, LYZ, FCN1, S100A9, TYMP 
	   FCER1G, CFD, LGALS1, S100A8, CTSS, LGALS2, SERPINA1, IFITM3, SPI1, CFP 
	   PSAP, IFI30, SAT1, COTL1, S100A11, NPC2, GRN, LGALS3, GSTP1, PYCARD 
Negative:  MALAT1, LTB, IL32, IL7R, CD2, B2M, ACAP1, CD27, STK17A, CTSW 
	   CD247, GIMAP5, AQP3, CCL5, SELL, TRAF3IP3, GZMA, MAL, CST7, ITM2A 
	   MYC, GIMAP7, HOPX, BEX2, LDLRAP1, GZMK, ETS1, ZAP70, TNFAIP8, RIC3 
PC_ 2 
Positive:  CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1, HLA-DRA, LINC00926, CD79B, HLA-DRB1, CD74 
	   HLA-DMA, HLA-DPB1, HLA-DQA2, CD37, HLA-DRB5, HLA-DMB, HLA-DPA1, FCRLA, HVCN1, LTB 
	   BLNK, P2RX5, IGLL5, IRF8, SWAP70, ARHGAP24, FCGR2B, SMIM14, PPP1R14A, C16orf74 
Negative:  NKG7, PRF1, CST7, GZMB, GZMA, FGFBP2, CTSW, GNLY, B2M, SPON2 
	   CCL4, GZMH, FCGR3A, CCL5, CD247, XCL2, CLIC3, AKR1C3, SRGN, HOPX 
	   TTC38, APMAP, CTSC, S100A4, IGFBP7, ANXA1, ID2, IL32, XCL1, RHOC 
PC_ 3 
Positive:  HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1, HLA-DPA1, CD74, MS4A1, HLA-DRB1, HLA-DRA 
	   HLA-DRB5, HLA-DQA2, TCL1A, LINC00926, HLA-DMB, HLA-DMA, CD37, HVCN1, FCRLA, IRF8 
	   PLAC8, BLNK, MALAT1, SMIM14, PLD4, LAT2, IGLL5, P2RX5, SWAP70, FCGR2B 
Negative:  PPBP, PF4, SDPR, SPARC, GNG11, NRGN, GP9, RGS18, TUBB1, CLU 
	   HIST1H2AC, AP001189.4, ITGA2B, CD9, TMEM40, PTCRA, CA2, ACRBP, MMD, TREML1 
	   NGFRAP1, F13A1, SEPT5, RUFY1, TSC22D1, MPP1, CMTM5, RP11-367G6.3, MYL9, GP1BA 
PC_ 4 
Positive:  HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1, CD74, HLA-DPB1, HIST1H2AC, PF4, TCL1A 
	   SDPR, HLA-DPA1, HLA-DRB1, HLA-DQA2, HLA-DRA, PPBP, LINC00926, GNG11, HLA-DRB5, SPARC 
	   GP9, AP001189.4, CA2, PTCRA, CD9, NRGN, RGS18, GZMB, CLU, TUBB1 
Negative:  VIM, IL7R, S100A6, IL32, S100A8, S100A4, GIMAP7, S100A10, S100A9, MAL 
	   AQP3, CD2, CD14, FYB, LGALS2, GIMAP4, ANXA1, CD27, FCN1, RBP7 
	   LYZ, S100A11, GIMAP5, MS4A6A, S100A12, FOLR3, TRABD2A, AIF1, IL8, IFI6 
PC_ 5 
Positive:  GZMB, NKG7, S100A8, FGFBP2, GNLY, CCL4, CST7, PRF1, GZMA, SPON2 
	   GZMH, S100A9, LGALS2, CCL3, CTSW, XCL2, CD14, CLIC3, S100A12, CCL5 
	   RBP7, MS4A6A, GSTP1, FOLR3, IGFBP7, TYROBP, TTC38, AKR1C3, XCL1, HOPX 
Negative:  LTB, IL7R, CKB, VIM, MS4A7, AQP3, CYTIP, RP11-290F20.3, SIGLEC10, HMOX1 
	   PTGES3, LILRB2, MAL, CD27, HN1, CD2, GDI2, ANXA5, CORO1B, TUBA1B 
	   FAM110A, ATP1A1, TRADD, PPA1, CCDC109B, ABRACL, CTD-2006K23.1, WARS, VMO1, FYB 
> #Without graph.name argument
> #pbmc <- FindNeighbors(pbmc, graph.name ="test", dims = 1:10)
> #pbmc <- FindClusters(pbmc, graph.name = "test", algorithm = 4, resolution = 0.5)
> pbmc <- FindNeighbors(pbmc, dims = 1:10)
Computing nearest neighbor graph
Computing SNN
> pbmc <- FindClusters(pbmc, algorithm = 4, resolution = 0.5)
> pbmc <- RunUMAP(pbmc, dims = 1:10)
15:38:52 UMAP embedding parameters a = 0.9922 b = 1.112
15:38:52 Read 2638 rows and found 10 numeric columns
15:38:52 Using Annoy for neighbor search, n_neighbors = 30
15:38:52 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:38:53 Writing NN index file to temp file /tmp/Rtmp66O0pD/filee2d55ca00c5
15:38:53 Searching Annoy index using 1 thread, search_k = 3000
15:38:54 Annoy recall = 100%
15:38:54 Commencing smooth kNN distance calibration using 1 thread
15:38:55 Initializing from normalized Laplacian + noise
15:38:55 Commencing optimization for 500 epochs, with 105124 positive edges
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:39:00 Optimization finished
> DimPlot(pbmc, reduction = "umap", label = TRUE, label.size = 6)
```
![without-graph-name](https://user-images.githubusercontent.com/71743538/109353624-a0770600-784a-11eb-8f7e-50684d4af207.png)

```
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pbmc3k.SeuratData_3.1.4 patchwork_1.1.1         SeuratObject_4.0.0      Seurat_4.0.0           
[5] dplyr_1.0.4            

loaded via a namespace (and not attached):
  [1] nlme_3.1-152         matrixStats_0.58.0   RcppAnnoy_0.0.18     RColorBrewer_1.1-2  
  [5] httr_1.4.2           sctransform_0.3.2    tools_4.0.4          utf8_1.1.4          
  [9] R6_2.5.0             irlba_2.3.3          rpart_4.1-15         KernSmooth_2.23-18  
 [13] uwot_0.1.10          mgcv_1.8-33          DBI_1.1.1            lazyeval_0.2.2      
 [17] colorspace_2.0-0     withr_2.4.1          tidyselect_1.1.0     gridExtra_2.3       
 [21] compiler_4.0.4       cli_2.3.1            plotly_4.9.3         labeling_0.4.2      
 [25] scales_1.1.1         lmtest_0.9-38        spatstat.data_2.0-0  ggridges_0.5.3      
 [29] pbapply_1.4-3        rappdirs_0.3.3       spatstat_1.64-1      goftest_1.2-2       
 [33] stringr_1.4.0        digest_0.6.27        spatstat.utils_2.0-0 pkgconfig_2.0.3     
 [37] htmltools_0.5.1.1    parallelly_1.23.0    fastmap_1.1.0        htmlwidgets_1.5.3   
 [41] rlang_0.4.10         rstudioapi_0.13      shiny_1.6.0          farver_2.0.3        
 [45] generics_0.1.0       zoo_1.8-8            jsonlite_1.7.2       ica_1.0-2           
 [49] magrittr_2.0.1       Matrix_1.3-2         Rcpp_1.0.6           munsell_0.5.0       
 [53] fansi_0.4.2          abind_1.4-5          reticulate_1.18      lifecycle_1.0.0     
 [57] stringi_1.5.3        MASS_7.3-53.1        Rtsne_0.15           plyr_1.8.6          
 [61] grid_4.0.4           parallel_4.0.4       listenv_0.8.0        promises_1.2.0.1    
 [65] ggrepel_0.9.1        crayon_1.4.1         miniUI_0.1.1.1       deldir_0.2-10       
 [69] lattice_0.20-41      cowplot_1.1.1        splines_4.0.4        tensor_1.5          
 [73] pillar_1.5.0         igraph_1.2.6         future.apply_1.7.0   reshape2_1.4.4      
 [77] codetools_0.2-18     leiden_0.3.7         glue_1.4.2           data.table_1.14.0   
 [81] BiocManager_1.30.10  vctrs_0.3.6          png_0.1-7            httpuv_1.5.5        
 [85] gtable_0.3.0         RANN_2.6.1           purrr_0.3.4          polyclip_1.10-0     
 [89] tidyr_1.1.2          SeuratData_0.2.1     scattermore_0.7      future_1.21.0       
 [93] assertthat_0.2.1     ggplot2_3.3.3        xfun_0.21            mime_0.10           
 [97] xtable_1.8-4         RSpectra_0.16-0      later_1.1.0.1        survival_3.2-7      
[101] viridisLite_0.3.0    tibble_3.1.0         tinytex_0.29         cluster_2.1.1       
[105] globals_0.14.0       fitdistrplus_1.1-3   ellipsis_0.3.1       ROCR_1.0-11   
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using the graph.name argument generates clustering that is different when not using the graph.name argument? #4155

Simply using the `graph.name` argument generates clustering that is different when not using the `graph.name` argument. (either more or less clusters, more or less cells in clusters)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using the graph.name argument generates clustering that is different when not using the graph.name argument? #4155

Description

Simply using the graph.name argument generates clustering that is different when not using the graph.name argument. (either more or less clusters, more or less cells in clusters)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Simply using the `graph.name` argument generates clustering that is different when not using the `graph.name` argument. (either more or less clusters, more or less cells in clusters)