-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
correct way to use DESeq after salmon quantification #581
Comments
Thanks @tamuanand for the (as always) detailed and clear question! Since this directly involves |
Thanks @rob-p and Thanks in advance @mikelove The original question pertained to using salmon with say ILMN RNA-Seq followed by DGE with DESeq2 @rob-p - I will also use this opportunity to indulge myself on a related question (how to use salmon with QuantSeq and then downstream with DESeq2). I have asked many QuantSeq related questions on this GH forum and I am yet to find the correct recipe for using salmon with quantseq and downstream DGE
@rob-p @mikelove - Here is my thought process (for salmon-QuantSeq-DESeq):
Let me know if you would approach the salmon-QuantSeq-DESeq puzzle differently. Thanks in advance. |
Just to check: do you have length bias in your data (are counts roughly proportional to effective transcript length)? |
@mikelove - was this question with reference to salmon quant on QuantSeq data?
@rob-p Is there a way to get the answer to Mike's question from the meta_info.json files. Also, aren't the counts in quant.sf file provided after taking into account length bias and effective transcript length? This is the salmon quant command line being used for RNA-Seq quantification - still not figured out the right command line combination for QuantSeq data
The original question in the post is "what are the correct steps with tximport for running DESEQ after salmon quant" |
The standard is the code chunk in the vignette:
Or even better, you can use tximeta:
If you have a special protocol which does not involve fragmentation of a full length transcript, then you do something else. But if you are fragmenting molecules and sequencing from along the entire transcript, use those code chunks from the vignette. |
Thanks @mikelove I believe tximeta can be used only for human/mouse? In my case, it is not human/mouse @rob-p and @mikelove - Based on my reading of the salmon documentation, isn't it that the NumReads/TPM etc made available after lengthCorrection. Extending this, the NumReads in quant.sf corresponds to the estimated count value for each transcript and correlated by effective length. My idea is to therefore use the countsFromAbundance=“lengthScaledTPM” to compute counts that are on the same scale as original counts and not correlated with transcript length across samples. Given this - Is this below also valid (after salmon quant)
|
To keep the code simpler:
DESeq2 will do the right thing based on the value of (You can still use tximeta with organisms other than human, mouse, or fly, you just have to run |
Using your code snippet, what's the subtle difference between using
Basis of this particular question of mine (with use of DESeqDataSetFromMatrix in the latter code block above):
|
No difference. I only prefer people use |
Thanks @mikelove
Based on the above, I assume that doing something like this is wrong as DESeqDataSetFromMatrix is being used after countsFromAbundance = "no"
@rob-p and @mikelove -- While on this topic, how would you use salmon quant and DESeq2 for QuantSeq data (which would be 3' tagged RNA-seq)? Would you use
|
It seems this is unambiguous from the documentation, right? Is there any question left about what to do? I can’t think how to explain it in sentences that are different from the above. |
@mikelove @rob-p My specific question (probably to @rob-p ) was on how to use salmon before I use DESeq2
|
This might be unrelated to the topic but I am a bit confused as towhich column values does tximport use from the quant.sf to generate the counts matrix (TPM or NumReads)? I found that some genes have 'NA' as values and some are '0'. could someone explain this to me please? |
TPM is an abundance measure and becomes NumReads is an estimated count and becomes Each software has different names for these, you can browse in the code here: https://github.com/mikelove/tximport/blob/master/R/tximport.R#L355-L358 |
thank you |
Hi @rob-p
I have a question on the "right" way of tximport/DESeq2 after salmon quant.
Why I ask "right way" - is because I am a bit confused with the tximport vignette
1 - https://bioconductor.org/packages/devel/bioc/vignettes/tximport/inst/doc/tximport.html#Downstream_DGE_in_Bioconductor
2 - https://bioconductor.org/packages/devel/bioc/vignettes/tximport/inst/doc/tximport.html#Salmon
Why the confusion - https://bioconductor.org/packages/devel/bioc/vignettes/tximport/inst/doc/tximport.html#Downstream_DGE_in_Bioconductor - states
Now, if I were to use the 2nd bullet as guide, shouldn't txi be generated this way for use with DESeq -- see the addition of
countsFromAbundance = "lengthScaledTPM"
to tximport lineAnd then use the tx2gene_NumReads.csv with DESeqDataSetFromMatrix, where the countData comes after reading in tx2gene_NumReads.csv upstream in the code. Note: I am using DESeqDataSetFromMatrix here and not DESeqDataSetFromTximport as I have used tximport with countsFromAbundance=lengthScaledTPM
I also saw these 2 links - https://hbctraining.github.io/DGE_workshop_salmon/lessons/07_DGE_summarizing_workflow.html and https://hbctraining.github.io/DGE_workshop_salmon/lessons/01_DGE_setup_and_overview.html
Thanks in advance,
The text was updated successfully, but these errors were encountered: