Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse strange contig names correctly in the commandline -L argument #1438

Closed
yfarjoun opened this issue Jan 16, 2016 · 4 comments · Fixed by #4093
Closed

parse strange contig names correctly in the commandline -L argument #1438

yfarjoun opened this issue Jan 16, 2016 · 4 comments · Fixed by #4093
Assignees
Milestone

Comments

@yfarjoun
Copy link
Contributor

Charlotte recently stepped on a bug in GATK: It interprets -L argument that ends in the regex ':[0-9]*' as indicating a single site in the contig that precedes it and then barfs if it cannot find that contig in the dictionary. In hg38 we have contig names like 'HLA:01:01:01' and when used on the command-line (as in CreateRealignerTargets) it barfs as in the following workflow: https://picard.broadinstitute.org/pipeline/workflows/viewWorkflow/8536444

given that the SAM spec allows any printed character ! through ~ in the ending of contig names (yikes!!) samtools/hts-specs#124 it seems that some more "smarts" needs to be put into the parsing of this argument.

@droazen
Copy link
Collaborator

droazen commented Jan 19, 2016

This will be tricky, as GATK4, unlike GATK3, does not require that a sequence dictionary be present (eg., tools can take just a vcf, for example). It is a bit crazy that these problematic characters are forbidden in the VCF spec but not the SAM spec...

@droazen droazen added this to the beta milestone Jan 19, 2016
@droazen droazen added the bogus label Jan 19, 2016
@droazen droazen removed this from the beta milestone Mar 28, 2016
@droazen droazen added this to the 4.0 release milestone Mar 22, 2017
@droazen
Copy link
Collaborator

droazen commented Oct 16, 2017

Re-assigning to @cmnbroad for the 4.0 milestone.

@cmnbroad
Copy link
Collaborator

#4093 detects ambiguities, but throws when it finds them. Reopening this to keep the history, since we should still probably invent some kind of quoting mechanism to allow the user to resolve ambiguities.

@cmnbroad cmnbroad reopened this Jan 10, 2018
@droazen
Copy link
Collaborator

droazen commented Jan 16, 2018

This is done well enough for hg38 purposes -- we can open a separate ticket if we want to go further with a quoting mechanism.

@droazen droazen closed this as completed Jan 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants