You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 8, 2018. It is now read-only.
This tool looks excellent, but by using RPKM it will be mathematically wrong and less effective than it potentially could be. You should be operating on library-size normalised effective counts, or on TPM.
RPKMs are not comparable between samples. This is a fact contained in the definition of RPKM because it depends on the mean expressed transcript length. This was first pointed out explicitly in the RSEM paper (section 1.1.1). It was then restated (without attribution!) more explicitly in Wagner et al.. It has also been demonstrated empirically to make a difference.
For further context see Lior Pachter's keynote in which he apologises for making Cufflinks use RPKM which has led to it being widely misused: http://www.youtube.com/watch?v=5NiFibnbE8o.
Any reviewer who has been paying attention to the literature will pick this up. Admittedly, there aren't many such reviewers, but it's also important to maximise the correctness and utility of your software.
The text was updated successfully, but these errors were encountered:
It is important to note that SwitshSeq does not attempt to make any claims on significance based on RPKM/FPKM values, and that the user is pointed to alternative tools like DEXSeq and MMDIFF for that. Then, given the initial matrix of expression values, SwitchSeq identifies the most abundant transcript within each gene for each given sample, and reports cases where the identity of this transcript differs across conditions. Hence, the use of RPKMs/FPKMs is limited to the visualisation of expression levels, and those should be interpreted in the context of the results provided by the tools mentioned above.
Furthermore, both SwitchSeq and tviz can be used with any normalisation method, as long as the input is provided in the required matrix format. Typically, I encourage people to work with the normalisation method provided within the DESeq2 package, and then divide the normalised counts by the feature length if the goal is to compare across genes. This approach is fully compatible with SwitchSeq.
I've now updated the documentation for both SwitchSeq and tviz to not refer exclusively to the RPKM method, and I've clarified this in the tutorial. In addition, I have changed the name of the slot 'rpkm' from the TranscriptExpressionSet object to 'texp'.
Thanks for the rapid response, and for your clarification of the actual use of the values within SwitchSeq. It's now clear that SwitchSeq wouldn't have been making incorrect calls on that basis. The updated tutorial encourages good practice, as does the slot name change.
I agree with your suggested normalisation strategy in general.
This tool looks excellent, but by using RPKM it will be mathematically wrong and less effective than it potentially could be. You should be operating on library-size normalised effective counts, or on TPM.
RPKMs are not comparable between samples. This is a fact contained in the definition of RPKM because it depends on the mean expressed transcript length. This was first pointed out explicitly in the RSEM paper (section 1.1.1). It was then restated (without attribution!) more explicitly in Wagner et al.. It has also been demonstrated empirically to make a difference.
For further context see Lior Pachter's keynote in which he apologises for making Cufflinks use RPKM which has led to it being widely misused: http://www.youtube.com/watch?v=5NiFibnbE8o.
Any reviewer who has been paying attention to the literature will pick this up. Admittedly, there aren't many such reviewers, but it's also important to maximise the correctness and utility of your software.
The text was updated successfully, but these errors were encountered: