Skip to content

WWXkenmo/LSI_for_batch_effect_correction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latent semantic indexing for single cell batch effects correction

This repo is for batch effects correction through LSI, which is the fully R implements of RPCI [1]

running example

we used the example data downloaded from the following links (a) the matrix file ("bcc_scRNA_counts.txt") https://ftp.ncbi.nlm.nih.gov/geo/series/GSE123nnn/GSE123813/suppl/GSE123813%5Fbcc%5FscRNA%5Fcounts%2Etxt% (b) meta_cell file ("bcc_all_metadata.txt") https://ftp.ncbi.nlm.nih.gov/geo/series/GSE123nnn/GSE123813/suppl/GSE123813%5Fbcc%5Fall%5Fmetadata%2Etxt%2E users could first processing the data using the RPCI_example_code, to check the RPCI generate cell embedding vector

head(data0@DimReduction$cell.pls)

our provided implements could be run through following code

embed <- LSI(list(dat6, dat1, dat2, dat3, dat4, dat5, dat7, dat8),var0,18,50,center=TRUE)

to make comparison

cor(data0@DimReduction$cell.pls[,dims],embed[,dims])

I need to note that once the dims chose to > 18, two embedding would be different, the reason is that when doing second round SVD, the inner producted matrix is generated base on the first round truncted singular vector, therefore the latent dimensionality is constraint by the first round SVD (which is, in our case, 18). So if we chose a dimensions > 18 to perform SVD (50), the >=19th singular vector would be random.

[1] Liu Y, Wang T, Zhou B, et al. Robust integration of multiple single-cell RNA sequencing datasets using a single reference space[J]. Nature biotechnology, 2021: 1-8.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages