-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WDL for iSNVs/v-phaser #828
Conversation
…aser output merge_to_vcf did not previously output correct alternate genotype number for some cases of multi-chr v-phaser output (line intrahost.py:820 of this commit). This corrects that and changes test cases to use the chr "-#" encoding to denote chromosome/segment number. This also changes merge_to_vcf so sample names are guessed from first column of v-phaser output if sample names are not passed in via --samples.
add dummy workflows to support newer cromwell that requires WDL files to have a workflow block; add single-sample isnv workflows
restore dummy workflows to WDL task files; this allows womtool to validate them
…romwell and dxWDL
Note: this is WDL draft-2. Changing to WDL 1.0 will require various changes, starting with a "version 1.0" statement as the first line in each WDL. Also, input {} blocks. See: https://github.com/openwdl/wdl/blob/master/versions/1.0/SPEC.md
add dx default file inputs for new isnv workflows (mostly taxon_filter DBs)
…lowing for the inverted case)
Okay this is a big one... I think I'll need some deconstructed explanation on each piece. So far, I see a few things in this PR:
A few questions and thoughts:
Anyway, breaking it down a bit will help navigate the actual diffs. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, went through the diffs in more detail, let me know what you think
intrahost.py
Outdated
"""samples, isnvs, and alignments must have the same number of elements | ||
(plus an extra reference record in the alignment). | ||
%s does not have the right number of sequences""" % fileName) | ||
# for fileName in alignments: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commented out section of intrahost.merge_to_vcf
appears to have been simply a pre-screening sanity check / safety check to make sure the inputs were self consistent. Is this check no longer necessary or is it just that it doesn't function for guessed sample names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll adapt it for the both cases, where sample names are provided, and when they are not.
intrahost.py
Outdated
@@ -580,7 +595,10 @@ def merge_to_vcf( | |||
alignmentFile) | |||
|
|||
for s in samplesToUse: | |||
for row in util.file.read_tabfile(samp_to_isnv[sampleIDMatch(s)]): | |||
isnv_filepath = samp_to_isnv[sampleIDMatch(s)] | |||
if os.path.getsize(isnv_filepath) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the explicit check for os.path.getsize() == 0
? Are there special scenarios where we pass in empty files that semantically mean something we should skip, or is this just a normal situation where vphaser detects no isnvs for that particular file (and thus produces empty output)? util.file.read_tabfile
should simply produce an empty iterator, so this check seems unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this should not be necessary.
intrahost.py
Outdated
isnvs_to_use.append(isnvs_file) | ||
continue | ||
for row in util.file.read_tabfile(isnvs_file): | ||
guessed_sample_ID = sampleIDMatch(row[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read this correctly, this code is unreachable, because anytime the isnvs_file
is non-empty, the above if
statement will continue
and skip this for
loop, but any time it's empty, the for
loop won't do anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that was sloppy of me. I'll fix this and add a unit test where sample names are guessed so this has some test coverage.
@@ -800,7 +817,7 @@ def merge_to_vcf( | |||
continue | |||
alleleMap = dict((a, i) for i, a in enumerate(alleles)) | |||
# GT col emitted below | |||
genos = [str(alleleMap.get(consAlleles.get(s), '.')) for s in samples] | |||
genos = [str(alleleMap.get(consAlleles.get(s), '.')) for s in samplesToUse] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, so this was a bug before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. :(
@@ -152,7 +152,7 @@ def test_single_strand_bias_hard_filter(self): | |||
self.assertEqual(output[0][7:], expected[7:]) | |||
|
|||
|
|||
@unittest.skipIf(tools.is_osx(), "vphaser2 osx binary from bioconda has issues") | |||
#@unittest.skipIf(tools.is_osx(), "vphaser2 osx binary from bioconda has issues") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did the osx version get fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be? The test passes on OSX and I can get help text when I call it in isolation. Do you recall what the issue was?
test/unit/test_intrahost.py
Outdated
@@ -224,8 +224,10 @@ def add_indel(self, sample, chrom, pos, acounts, libinfo=None): | |||
|
|||
def dump_isnv_tmp_file(self, sample): | |||
fn = util.file.mkstempfname('.txt') | |||
print("self.isnvs[sample]",sample,list(map(str,self.isnvs[sample]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure we want to print
all this or was this leftover from a debugging stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover, whoops!
add unit test for case where sample names are guessed for merge_to_vcf; rework the sectio where they are guessed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, let's make it so!
travis/install-wdl.sh
Outdated
cached_fetch_jar_from_github dnanexus dxWDL dxWDL 0.60.2 | ||
cached_fetch_jar_from_github broadinstitute cromwell womtool 32 | ||
cached_fetch_jar_from_github broadinstitute cromwell cromwell 32 | ||
cached_fetch_jar_from_github dnanexus dxWDL dxWDL 0.66.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't cause much trouble, since you're updating from dxWDL 0.60.2 -> 0.66.2 anyway, want to just try going all the way to 0.68.1 (since some time has elapsed in the course of the development of this branch)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see what happens. I'll try updating Cromwell too.
Actually it might be weird if you update Cromwell because then dxWDL and Cromwell will be out of sync (dxWDL 0.68.1 is based on Cromwell 32, I think). |
…l-ngs into ct-vphaser-wdl
specify project for dxWDL compile; this appears to be required, though it is not mentioned in the docs? https://github.com/dnanexus/dxWDL/blob/0.68.1/src/main/scala/dxWDL/Main.scala#L142
This PR adds WDLs for running v-phaser on one sample to output iSNVs. It probably makes sense to wait to do multi-sample via scatter-gather until we update everything to WDL 1.0 which has a
Pair
datatype for gathering multiple files (we're using WDL draft-2 currently).Also, this should probably be merged via rebase due to all of the dx debugging...