This repo is for batch effects correction through LSI, which is the fully R implements of RPCI [1]
we used the example data downloaded from the following links (a) the matrix file ("bcc_scRNA_counts.txt") https://ftp.ncbi.nlm.nih.gov/geo/series/GSE123nnn/GSE123813/suppl/GSE123813%5Fbcc%5FscRNA%5Fcounts%2Etxt% (b) meta_cell file ("bcc_all_metadata.txt") https://ftp.ncbi.nlm.nih.gov/geo/series/GSE123nnn/GSE123813/suppl/GSE123813%5Fbcc%5Fall%5Fmetadata%2Etxt%2E users could first processing the data using the RPCI_example_code, to check the RPCI generate cell embedding vector
head(data0@DimReduction$cell.pls)
our provided implements could be run through following code
embed <- LSI(list(dat6, dat1, dat2, dat3, dat4, dat5, dat7, dat8),var0,18,50,center=TRUE)
to make comparison
cor(data0@DimReduction$cell.pls[,dims],embed[,dims])
I need to note that once the dims chose to > 18, two embedding would be different, the reason is that when doing second round SVD, the inner producted matrix is generated base on the first round truncted singular vector, therefore the latent dimensionality is constraint by the first round SVD (which is, in our case, 18). So if we chose a dimensions > 18 to perform SVD (50), the >=19th singular vector would be random.
[1] Liu Y, Wang T, Zhou B, et al. Robust integration of multiple single-cell RNA sequencing datasets using a single reference space[J]. Nature biotechnology, 2021: 1-8.