Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the big data and as.matrix #5

Open
fenghuijian opened this issue Aug 6, 2021 · 1 comment
Open

Using the big data and as.matrix #5

fenghuijian opened this issue Aug 6, 2021 · 1 comment

Comments

@fenghuijian
Copy link

Hi
Your software is great, but I'm having problems when running big data. When I called readscdata, I fund the error Error in asMethod(object): Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102, and then I check your source code. I fund one bug of readscdata that
function (count, cell, gene, is.filter = TRUE)
{
if (exists("count") & exists("cell") & exists("gene")) {
if (all(colnames(count) == rownames(cell)) & all(rownames(count) ==
rownames(gene))) {
run.data0 = as.matrix(count)
mito.gene = grep(pattern = "^mt-", x = rownames(run.data0),
ignore.case = TRUE, value = TRUE)
run.cell = data.frame(scBarcode = rownames(cell),
scUMI = Matrix::colSums(run.data0), ngene = Matrix::colSums(run.data0 >
0), row.names = rownames(cell), stringsAsFactors = FALSE)
run.cell$mito = Matrix::colSums(run.data0[rownames(run.data0) %in%
mito.gene, ])/run.cell$scUMI
run.cell = cbind.data.frame(run.cell, cell)
run.gene = data.frame(Symbol = rownames(run.data0),
RNA = "Gene Expression", row.names = rownames(run.data0),
stringsAsFactors = FALSE)
run.gene$nCell = Matrix::rowSums(run.data0 > 0)
if (is.filter) {
run.gene = run.gene[run.gene$nCell > 0, ]
}
else {
run.gene = run.gene
}
run.data0 = run.data0[rownames(run.data0) %in% run.gene$Symbol,
]
SingleCellData(assay = list(count = as(run.data0,
"dgCMatrix")), rowdata = data.frame(run.gene,
stringsAsFactors = FALSE), coldata = data.frame(run.cell,
stringsAsFactors = FALSE))
}
else {
stop("Matrix colnames or rownames are not equal to cell name or gene name")
}
}
else {
stop("No matrix, cell or gene is found here")
}
}
I think that the run.data0 = as.matrix(count) is redundant, because when you build an object you need to convert it to a matrix and then to a sparse matrix.
However, if the big original data is sparse matrix(dgCMatrix), the memory will be insufficient after the run.data0 = as.matrix(count), which will lead to the occurrence of the above bug.
So I suggest the author delete it if the code run.data0 = as.matrix(count) is useless.

@bioinfoDZ
Copy link
Owner

We are working to a new release that will handle large datasets better. In the meantime, you may look into the MetaCell for potential solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants