You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you very much for glmGamPoi! I've been trying to get it work on a dataset I have, where I have 9 clusters for 2 genotypes, 3 replicate for each genotype. I have tried to run:
fit <- glm_gp( bm.mat_subset,
design = ~ genotype + seurat_clusters + genotype:seurat_clusters - 1,
col_data = metadat,
subsample = TRUE,
on_disk = FALSE,
reference_level = "WT" )
c0 <- test_de(fit,
contrast = genotypeKO - genotypeWT,
pseudobulk_by = mouse ,
sort_by = pval, decreasing = FALSE)
Error in handle_design_parameter(design, data, col_data, reference_level) :
The model_matrix has more columns (18) than the there are samples in the data matrix (6 columns).
Too few replicates / too many coefficients to fit model.
The head of the design matrix:
genotypeWT genotypeKO seurat_clusters1 seurat_clusters2 seurat_clusters3 seurat_clusters4 seurat_clusters5 seurat_clusters6 seurat_clusters7 seurat_clusters8
I've looked at your example and I get the same error if I don't pre-filter the data to few clusters (NK cells, B cells and T cells), as it's done in the example. The resulting fit has 16 coeficients and the data has 16 samples ( ind + stim) so it produces the same error too few replicates/too many coeficients to fit the model. Is this a bug of do we always have to prefilter the data so the number of coeficients is less than the number of samples you "pseudobulk_by"?
would you say that doing the following would be a good way to overcome the issues above?
sce_subset <- sce[rowSums(counts(sce)) > 100,
sample(which(! is.na(sce$cell)), 1000)]
counts(sce_subset) <- as.matrix(counts(sce_subset))
sce_subset$cell <- droplevels(sce_subset$cell)
fit <- glm_gp(sce_subset, design = ~ cell + stim + stim:cell - 1,
reference_level = "NK cells")
fit
glmGamPoiFit object:
The data had 9727 rows and 1000 columns.
A model with 16 coefficient was fitted.
> de_res <- test_de(fit, contrast = `stimstim` + `cellCD4 T cells:stimstim`,
pseudobulk_by = paste0(stim, "-", ind))
Error in handle_design_parameter(design, data, col_data, reference_level) :
The model_matrix has more columns (16) than the there are samples in the data matrix (16 columns).
Too few replicates / too many coefficients to fit model.
The head of the design matrix:
cellNK cells cellB cells cellCD14+ Monocytes cellCD4 T cells cellCD8 T cells cellDendritic cells cellFCGR3A+ Monocytes cellMegakaryocytes stimstim cellB cells:stimstim cellCD14+ Monocytes:stimstim cellCD4 T cells:stimstim cellCD8 T cells:stimstim cellDendritic cells:stimstim cellFCGR3A+ Monocytes:stimstim cellMegakaryocytes:stimstim
ctrl-101 0.05714286 0.11428571 0.4000000 0.2285714 0.14285714 0.00000000 0.02857143 0.02857143 0 0 0 0 0 0 0 0
ctrl-1015 0.04761905 0.20000000 0.2761905 0.3333333 0.03809524 0.01904762
The text was updated successfully, but these errors were encountered:
Your question actually prompted me to take another look at documentation and implementation regarding pseudobulks in glmGamPoi. I decided to deprecate the pseudobulk_by argument in test_de in the upcoming release. Not because it produces wrong results, but simply because it is wasteful to first fit a model on thousands of cells and then throw the model away and fit a new model inside test_de on the aggregated data.
In the next release, the first step is to form a SummarizedExperiment object with the pseudobulk samples and then call glm_gp and test_de on the aggregated data.
For your use case the code would be something like this:
Hi Constantin,
Thank you very much for glmGamPoi! I've been trying to get it work on a dataset I have, where I have 9 clusters for 2 genotypes, 3 replicate for each genotype. I have tried to run:
I've looked at your example and I get the same error if I don't pre-filter the data to few clusters (NK cells, B cells and T cells), as it's done in the example. The resulting fit has 16 coeficients and the data has 16 samples ( ind + stim) so it produces the same error too few replicates/too many coeficients to fit the model. Is this a bug of do we always have to prefilter the data so the number of coeficients is less than the number of samples you "pseudobulk_by"?
would you say that doing the following would be a good way to overcome the issues above?
Thank you very much for your help
Miriam
The text was updated successfully, but these errors were encountered: