Unable to obtain raw data for reproduction #1

d-cameron · 2020-04-19T12:31:39Z

Hello.

I just saw your preprint go up and was wanting to generate ROC curves from the caller qual scores to compare to your single-point results but it appears the raw data is not available. Would you be able to update your repo and readme with :

A link to the google drive location of the bams and/or fastqs.
- Some aligners have significantly elevated FP rates when using bwa mem -a and am curious what the effect of your choice of aligner settings has on your results
Include the raw VCF files as output by the caller.
https://github.com/Mangul-Lab-USC/benchmarking-sv-callers-paper/tree/master/Data/raw_data/mouse only includes your processed results, not the raw VCFs. I am unable to generate ROC curves because you've stripped the essential information when generating those subset files (ie QUAL, and FILTER).

The text was updated successfully, but these errors were encountered:

Addicted-to-coding · 2020-04-21T13:57:23Z

Hello,
We have uploaded the raw VCF files for GRIDSS here and we are working on uploading the files for other tools. We will also upload the original fastq and bam files to SRA/ENA.

We weren't able to find any clear documentation on how to use QUAL, and FILTER so those were ignored. We are happy to incorporate it if you could provide some instructions on how to use them.

d-cameron · 2020-04-21T23:58:12Z

I used QUAL to generate ROC curves which are more informative than just single points. For callers not reporting a QUAL score I use # supporting reads as a proxy as that's typically what's used in a as cut-off. In your case Figures 1ef, and 3ef would benefit from QUAL lines. Similarly, using FILTER to split out caller results into a FILTER=PASS subset, and an all call subset is valuable.

Your Fig 2 is also a bit strange. I would have thought Fig2 should plot only TPs with an x axis of 'length of called variant - true length of variant'. As it is, it appears to be plotting whether the caller makes more small or large del calls, not how accurately the caller reports the length of the variants that it does call.

Happy to unofficially review your preprint if you're interested in more comprehensive feedback.

Cheers
Daniel Cameron

smangul1 · 2020-12-02T21:06:40Z

Thanks, Daniel for your feedback!

We intentionally did plot all SVs not just TPs to see how the distribution of inferred SVs is different from the true ones
But we are happy to make a plot with just TPs, this will help us access how accurately the tools can estimate the length of correctly detected SVs (probably with a 1000bp threshold)

Varuni,
can you please generate this plot

In terms of QUAL we will be happy to incorporate this in the future

Serghei

smangul1 assigned Addicted-to-coding Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to obtain raw data for reproduction #1

Unable to obtain raw data for reproduction #1

d-cameron commented Apr 19, 2020

Addicted-to-coding commented Apr 21, 2020 •

edited

Loading

d-cameron commented Apr 21, 2020

smangul1 commented Dec 2, 2020

Unable to obtain raw data for reproduction #1

Unable to obtain raw data for reproduction #1

Comments

d-cameron commented Apr 19, 2020

Addicted-to-coding commented Apr 21, 2020 • edited Loading

d-cameron commented Apr 21, 2020

smangul1 commented Dec 2, 2020

Addicted-to-coding commented Apr 21, 2020 •

edited

Loading