Don't encourage use of RPKM #1

blahah · 2014-02-26T08:54:32Z

This tool looks excellent, but by using RPKM it will be mathematically wrong and less effective than it potentially could be. You should be operating on library-size normalised effective counts, or on TPM.

RPKMs are not comparable between samples. This is a fact contained in the definition of RPKM because it depends on the mean expressed transcript length. This was first pointed out explicitly in the RSEM paper (section 1.1.1). It was then restated (without attribution!) more explicitly in Wagner et al.. It has also been demonstrated empirically to make a difference.

For further context see Lior Pachter's keynote in which he apologises for making Cufflinks use RPKM which has led to it being widely misused: http://www.youtube.com/watch?v=5NiFibnbE8o.

Any reviewer who has been paying attention to the literature will pick this up. Admittedly, there aren't many such reviewers, but it's also important to maximise the correctness and utility of your software.

mgonzalezporta · 2014-02-26T10:33:27Z

Thanks a lot for your comments.

It is important to note that SwitshSeq does not attempt to make any claims on significance based on RPKM/FPKM values, and that the user is pointed to alternative tools like DEXSeq and MMDIFF for that. Then, given the initial matrix of expression values, SwitchSeq identifies the most abundant transcript within each gene for each given sample, and reports cases where the identity of this transcript differs across conditions. Hence, the use of RPKMs/FPKMs is limited to the visualisation of expression levels, and those should be interpreted in the context of the results provided by the tools mentioned above.

Furthermore, both SwitchSeq and tviz can be used with any normalisation method, as long as the input is provided in the required matrix format. Typically, I encourage people to work with the normalisation method provided within the DESeq2 package, and then divide the normalised counts by the feature length if the goal is to compare across genes. This approach is fully compatible with SwitchSeq.

I've now updated the documentation for both SwitchSeq and tviz to not refer exclusively to the RPKM method, and I've clarified this in the tutorial. In addition, I have changed the name of the slot 'rpkm' from the TranscriptExpressionSet object to 'texp'.

I hope this addresses your concerns.

blahah · 2014-02-26T11:09:16Z

Thanks for the rapid response, and for your clarification of the actual use of the values within SwitchSeq. It's now clear that SwitchSeq wouldn't have been making incorrect calls on that basis. The updated tutorial encourages good practice, as does the slot name change.

I agree with your suggested normalisation strategy in general.

Concerns addressed in full!

blahah closed this as completed Feb 26, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't encourage use of RPKM #1

Don't encourage use of RPKM #1

blahah commented Feb 26, 2014

mgonzalezporta commented Feb 26, 2014

blahah commented Feb 26, 2014

Don't encourage use of RPKM #1

Don't encourage use of RPKM #1

Comments

blahah commented Feb 26, 2014

mgonzalezporta commented Feb 26, 2014

blahah commented Feb 26, 2014