You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Things that we discussed with @samuelklee that can be done to aid it:
-I think that all files we generate for individual case samples---"ReadCountCollection" files for coverage profiles, "AllelicCountCollection" files for het pulldowns, and segment files---should contain the sample name as metadata in a header comment with a common tag (e.g., #sampleName = ...). Currently, these sample names are stored in column headers, in the fields of a SAMPLE column, or not at all, depending on the type of file. This would drastically simplify the use of the SampleNameFinder class, which would basically only contain a single method to parse this header comment and return the name.
-CLIs that generate a file from an input BAM (CalculateTargetCoverage, GetHetCoverage, etc.) should take the sample name from that BAM by default. Since these are the first steps in our workflows, we could also optionally allow the user to specify a sample name different from that in the BAM.
-Subsequent CLIs should then take the sample name from the header comment.
-CLIs that take multiple non-BAM input files should check for consistency of the sample names as part of the argument validation step.
-CLIs that output the sample name in plots should derive these from the header comment.
-For files that contain data from multiple samples (e.g., the output of CombineReadCounts), we can probably leave the sample names in the column headers, but it would be nice to output the type of data stored in a header comment as well (e.g., PCOV or RAW). At some point I think we should restrict to RAW output only, see broadinstitute/gatk-protected#615.
-Entity names specified by the input file for the WDLs can be separate from the BAM sample names by default. However, if we do allow the user to optionally specify sample names as described in the first bullet point, we can set up the WDL to pass the entity names.
The text was updated successfully, but these errors were encountered:
@asmirnov239 commented on Wed Oct 19 2016
Things that we discussed with @samuelklee that can be done to aid it:
-I think that all files we generate for individual case samples---"ReadCountCollection" files for coverage profiles, "AllelicCountCollection" files for het pulldowns, and segment files---should contain the sample name as metadata in a header comment with a common tag (e.g., #sampleName = ...). Currently, these sample names are stored in column headers, in the fields of a SAMPLE column, or not at all, depending on the type of file. This would drastically simplify the use of the SampleNameFinder class, which would basically only contain a single method to parse this header comment and return the name.
-CLIs that generate a file from an input BAM (CalculateTargetCoverage, GetHetCoverage, etc.) should take the sample name from that BAM by default. Since these are the first steps in our workflows, we could also optionally allow the user to specify a sample name different from that in the BAM.
-Subsequent CLIs should then take the sample name from the header comment.
-CLIs that take multiple non-BAM input files should check for consistency of the sample names as part of the argument validation step.
-CLIs that output the sample name in plots should derive these from the header comment.
-For files that contain data from multiple samples (e.g., the output of CombineReadCounts), we can probably leave the sample names in the column headers, but it would be nice to output the type of data stored in a header comment as well (e.g., PCOV or RAW). At some point I think we should restrict to RAW output only, see broadinstitute/gatk-protected#615.
-Entity names specified by the input file for the WDLs can be separate from the BAM sample names by default. However, if we do allow the user to optionally specify sample names as described in the first bullet point, we can set up the WDL to pass the entity names.
The text was updated successfully, but these errors were encountered: