-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to Mutect2's Permutect training data mode #8663
Conversation
…nt M3 dataset modes -- Illumina and Ultima
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor comments.
public Mutect3DatasetMode mutect3DatasetMode = Mutect3DatasetMode.ILLUMINA; | ||
|
||
public enum Mutect3DatasetMode { | ||
ILLUMINA(11), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a little nervous when you hardcode the sequencing technology, especially when given the same value. Why not just make this a default? What is the justification for this? (We do not need to block this PR over this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Ultima uses unpaired reads it can't be encoded in the same way as Illumina data. It's only a coincidence that the dimension of read tensors is 11 in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And note that this is an Advanced
argument)
final Allele longestRef = allRefAlleles.stream().sorted(Comparator.comparingInt(Allele::length).reversed()).findFirst().get(); | ||
|
||
final List<Allele> remappedAltAlleles = ReferenceConfidenceVariantContextMerger.remapAlleles(vc, longestRef).stream() | ||
.skip(1).toList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Is skip(1)
just because we need to skip the reference allele? If so, just add a comment that you are making this assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@LeeTL1220 Back to you. Note one extra commit setting the default ratio of non-artifact training examples to artifact examples to 1 instead of 20. I checked and this has no effect on the quality of the trained model while vastly decreasing training time. |
@LeeTL1220 here are the changes I was talking about. The first commit contains a small change I have been using for a year, the second is the recent multiallelic indel representation bug fix.