Using the big data and as.matrix #5

fenghuijian · 2021-08-06T14:53:13Z

Hi
Your software is great, but I'm having problems when running big data. When I called readscdata, I fund the error Error in asMethod(object): Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102, and then I check your source code. I fund one bug of readscdata that
function (count, cell, gene, is.filter = TRUE)
{
if (exists("count") & exists("cell") & exists("gene")) {
if (all(colnames(count) == rownames(cell)) & all(rownames(count) ==
rownames(gene))) {
run.data0 = as.matrix(count)
mito.gene = grep(pattern = "^mt-", x = rownames(run.data0),
ignore.case = TRUE, value = TRUE)
run.cell = data.frame(scBarcode = rownames(cell),
scUMI = Matrix::colSums(run.data0), ngene = Matrix::colSums(run.data0 >
0), row.names = rownames(cell), stringsAsFactors = FALSE)
run.cell$mito = Matrix::colSums(run.data0[rownames(run.data0) %in%
mito.gene, ])/run.cell$scUMI
run.cell = cbind.data.frame(run.cell, cell)
run.gene = data.frame(Symbol = rownames(run.data0),
RNA = "Gene Expression", row.names = rownames(run.data0),
stringsAsFactors = FALSE)
run.gene$nCell = Matrix::rowSums(run.data0 > 0)
if (is.filter) {
run.gene = run.gene[run.gene$nCell > 0, ]
}
else {
run.gene = run.gene
}
run.data0 = run.data0[rownames(run.data0) %in% run.gene$Symbol,
]
SingleCellData(assay = list(count = as(run.data0,
"dgCMatrix")), rowdata = data.frame(run.gene,
stringsAsFactors = FALSE), coldata = data.frame(run.cell,
stringsAsFactors = FALSE))
}
else {
stop("Matrix colnames or rownames are not equal to cell name or gene name")
}
}
else {
stop("No matrix, cell or gene is found here")
}
}
I think that the run.data0 = as.matrix(count) is redundant, because when you build an object you need to convert it to a matrix and then to a sparse matrix.
However, if the big original data is sparse matrix(dgCMatrix), the memory will be insufficient after the run.data0 = as.matrix(count), which will lead to the occurrence of the above bug.
So I suggest the author delete it if the code run.data0 = as.matrix(count) is useless.

The text was updated successfully, but these errors were encountered:

bioinfoDZ · 2021-08-10T14:22:51Z

We are working to a new release that will handle large datasets better. In the meantime, you may look into the MetaCell for potential solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the big data and as.matrix #5

Using the big data and as.matrix #5

fenghuijian commented Aug 6, 2021

bioinfoDZ commented Aug 10, 2021

Using the big data and as.matrix #5

Using the big data and as.matrix #5

Comments

fenghuijian commented Aug 6, 2021

bioinfoDZ commented Aug 10, 2021