Make the way sample names are derived consistent across CNV tools #2910

droazen · 2017-06-05T18:02:08Z

@asmirnov239 commented on Wed Oct 19 2016

Things that we discussed with @samuelklee that can be done to aid it:

-I think that all files we generate for individual case samples---"ReadCountCollection" files for coverage profiles, "AllelicCountCollection" files for het pulldowns, and segment files---should contain the sample name as metadata in a header comment with a common tag (e.g., #sampleName = ...). Currently, these sample names are stored in column headers, in the fields of a SAMPLE column, or not at all, depending on the type of file. This would drastically simplify the use of the SampleNameFinder class, which would basically only contain a single method to parse this header comment and return the name.

-CLIs that generate a file from an input BAM (CalculateTargetCoverage, GetHetCoverage, etc.) should take the sample name from that BAM by default. Since these are the first steps in our workflows, we could also optionally allow the user to specify a sample name different from that in the BAM.

-Subsequent CLIs should then take the sample name from the header comment.

-CLIs that take multiple non-BAM input files should check for consistency of the sample names as part of the argument validation step.

-CLIs that output the sample name in plots should derive these from the header comment.

-For files that contain data from multiple samples (e.g., the output of CombineReadCounts), we can probably leave the sample names in the column headers, but it would be nice to output the type of data stored in a header comment as well (e.g., PCOV or RAW). At some point I think we should restrict to RAW output only, see broadinstitute/gatk-protected#615.

-Entity names specified by the input file for the WDLs can be separate from the BAM sample names by default. However, if we do allow the user to optionally specify sample names as described in the first bullet point, we can set up the WDL to pass the entity names.

samuelklee · 2018-01-10T18:24:47Z

Closed in #3914.

droazen mentioned this issue Jun 5, 2017

Make the way sample names are derived consistent across CNV tools broadinstitute/gatk-protected#751

Closed

droazen assigned asmirnov239 Jun 5, 2017

droazen added the Copy Number tools label Jun 5, 2017

samuelklee mentioned this issue Oct 10, 2017

GATK CNV on WGS. #2858

Closed

1 task

samuelklee closed this as completed Jan 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make the way sample names are derived consistent across CNV tools #2910

Make the way sample names are derived consistent across CNV tools #2910

droazen commented Jun 5, 2017 •

edited by samuelklee

Loading

samuelklee commented Jan 10, 2018

Uh oh!

Make the way sample names are derived consistent across CNV tools #2910

Make the way sample names are derived consistent across CNV tools #2910

Comments

droazen commented Jun 5, 2017 • edited by samuelklee Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

samuelklee commented Jan 10, 2018

Uh oh!

droazen commented Jun 5, 2017 •

edited by samuelklee

Loading