Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in estimateSizeFactorsForMatrix(counts.byGenes) #50

Open
HeenaBioinfo opened this issue Jul 29, 2020 · 2 comments
Open

Error in estimateSizeFactorsForMatrix(counts.byGenes) #50

HeenaBioinfo opened this issue Jul 29, 2020 · 2 comments

Comments

@HeenaBioinfo
Copy link

jscs.noNovel <- runJunctionSeqAnalyses(sample.files = countFiles.noNovel,

  •                                    sample.names = sample_small$sample,
    
  •                                    condition=factor(sample_small$clinGroup_PILOT),
    
  •                                    flat.gff.file = gff.file,
    
  •                                    nCores = 1,
    
  •                                    analysis.type = "junctionsAndExons")
    

STARTING runJunctionSeqAnalyses (v1.14.0) (Wed Jul 29 16:13:26 2020)
rJSA: sample.files: /space/Junction_data/rawCts/S000084.l.r.m.c.lib.g.k2.a/QC.spliceJunctionAndExonCounts.forJunctionSeq.txt.gz, /space/Junction_data/rawCts/S000556.l.r.m3.c.lib.g.k2.a/QC.spliceJunctionAndExonCounts.forJunctionSeq.txt.gz, /space/Junction_data/rawCts/S000556.l.r.m4.c.lib.g.k2.a/QC.spliceJunctionAndExonCounts.forJunctionSeq.txt.gz, /space/Junction_data/rawCts/S000556.l.r.m8.c.lib.g.k2.a/QC.spliceJunctionAndExonCounts.forJunctionSeq.txt.gz
rJSA: sample.names: S000084.l.r.m.c.lib.g.k2.a, S000556.l.r.m3.c.lib.g.k2.a, S000556.l.r.m4.c.lib.g.k2.a, S000556.l.r.m8.c.lib.g.k2.a
rJSA: condition: ERpHER2p, ERpHER2nLNn, ERpHER2nLNn, ERpHER2nLNn
rJSA: analysis.type: junctionsAndExons
rJSA: use.junctions: TRUE
rJSA: use.novel.junctions: TRUE
rJSA: use.exons: TRUE
rJSA: nCores: 1
rJSA: use.covars:
rJSA: test.formula0: ~ sample + countbin
rJSA: test.formula1: ~ sample + countbin + condition:countbin
rJSA: use.multigene.aggregates: FALSE
rJSA: Reading Count files... Wed Jul 29 16:13:26 2020.
-> STARTING readJunctionSeqCounts (Wed Jul 29 16:13:26 2020)
---> RJSC; (v1.14.0)
---> RJSC: samplenames: S000084.l.r.m.c.lib.g.k2.a,S000556.l.r.m3.c.lib.g.k2.a,S000556.l.r.m4.c.lib.g.k2.a,S000556.l.r.m8.c.lib.g.k2.a
---> RJSC: flat.gff.file: /space/Junction_data/flattened.gtf.gz
---> RJSC: use.exons:TRUE
---> RJSC: use.junctions:TRUE
---> RJSC: use.novel.junctions:TRUE
---> File read complete.
---> Extracted counts. Found 669130 features so far.
---> Extracted gene-level counts. Found: 47257 genes and aggregate-genes.
---> Removed gene features. Found: 621873 features to be included so far.
---> Note: 524335 counting bins from overlapping genes
---> There are 16295 multigene aggregates.
---> There are 73216 genes that are part of an aggregate.
---> Removed multigene-aggregate features. Found: 97538 features to be included so far.
---> Final feature count: 97538 features to be included in the analysis.
---> Extracted feature counts.
---> counts complete.
-----> reading annotation...
-----> formatting annotation...
-----> initial generation...
-----> creating jscs...
-----> generating count vectors... (Wed Jul 29 16:14:13 2020)
Using single-core execution.
getAllJunctionSeqCountVectors: dim(counts) = 97538,4 (Wed Jul 29 16:14:13 2020)
getAllJunctionSeqCountVectors: dim(gct) = 47257,4
getAllJunctionSeqCountVectors: out generated. dim = 97538,8 (Wed Jul 29 16:14:16 2020)
-----> count vectors generated (Wed Jul 29 16:14:16 2020)
-----> generating DESeqDataSet... (Wed Jul 29 16:14:16 2020)
-----> DESeqDataSet generated (Wed Jul 29 16:14:16 2020)
rJSA: Count files read. Wed Jul 29 16:14:16 2020.
rJSA: Estimating Size Factors... Wed Jul 29 16:14:16 2020.
Error in estimateSizeFactorsForMatrix(counts.byGenes) :
every gene contains at least one zero, cannot compute log geometric means

Hi Can you please guide me through this error?

@HeenaBioinfo
Copy link
Author

I am not able to generate the size factors, please guide..

@hartleys
Copy link
Owner

hartleys commented Aug 5, 2020

This occurs commonly in large samples and/or single-cell data. It can also occur if the samples are extremely variable and very different from one another (like in single-cell data). The geometric size factor calculation used by DESeq simply fails in this situation.

I would recommend skipping the size factor calculation step and replacing it with your own. For large samples / single sample data the Upper Quartile method seems to be gaining traction as the industry standard.

I would use the step-by-step analysis pipeline from the manual, but replace the estimateJunctionSeqSizeFactors step with the following:

#replace the estimation function with a version that takes an external size factor. 
# I Should have added this long ago...
estimateJunctionSeqSizeFactors.forced <- function( jscs , sizeFactors, verbose = FALSE){
   stopifnot( is( jscs, "JunctionSeqCountSet") )
   sizeFactors(jscs) <- sizeFactors
   sizeFactors(jscs@DESeqDataSet) <- rep(sizeFactors(jscs),2)
   fData(jscs)$baseMean <- rowMeans(counts(jscs, normalized= T))
   fData(jscs)$baseVar <- rowVars(counts(jscs, normalized=T))
   jscs@altSizeFactors <- data.frame()
   return(jscs);                         
}

#Calculate size factors using your favorite method.
#I recommend the upper quartile method, shown below:
SF <- sapply(1:ncol(jscs@geneCountData),function(i){
   quantile(jscs@geneCountData[,i],p=0.75);
})
#Now copy in those size factors:
jscs <- estimateJunctionSeqSizeFactors.forced(jscs,sizeFactors=SF)

For the future I'll fold this into the codebase with the next update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants