Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactoring some stuff for clarity #8598

Merged
merged 4 commits into from
Dec 7, 2023

Conversation

lbergelson
Copy link
Member

I removed an unused type parameter and refactored some functionality into a class MultiPloidyGenotyper which handles dealing with multiple ploidies in a somewhat unified way. It could probably be renamed and expanded slightly.

@gatk-bot
Copy link

gatk-bot commented Dec 4, 2023

Github actions tests reported job failures from actions build 7089943546
Failures in the following jobs:

Test Type JDK Job ID Logs
variantcalling 17.0.6+10 7089943546.2 logs

Copy link
Contributor

@rickymagner rickymagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me overall. Thanks for refactoring this stuff! This will definitely help keep some of the ploidy-specific logic encapsulated away from the rest of the HC code. I just had a few quick questions in my review to clarify.

I think it looks good to me to merge. Not sure if e.g. @jamesemery wants to take a look, since some other HC code was tweaked a little here as well.

}

public T getDefaultGenotypingEngine(){
return ploidyToGenotyperMap.get(DEFAULT_PLOIDY_SENTINEL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to keep this method separate from the other generic getGenotypingEngine, since the getPloidyToUseAtThisSite covers the default sentinel value case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included it because there was something that needed the "default" genotyper btt didn't necessarily know what the default ploidy was in that location I think. I would prefer to remove the `DEFAULT_PLOIDY_SENTINEL" entirely and just have it be based on the ploidy number, maybe save the default ploidy somewhere but I didn't do it yet.

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling the logic to its own class seems reasonable. Otherwise I question the value of many of these refactors as they don't seem to provide much clarity. A few specific refactors that are not worth making here.

*/
private static class OutputAlleleSubset {
private final List<Allele> alleles;
private final boolean siteIsMonomorphic;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know this is wrong and some ancient detritus... don't open this can of worms here... its not worth the trouble and we have to do this quick

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a completely automated refactor so it just saves a few lines of code and does the same thing, but I can revert it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion I've kept this change.

@@ -145,7 +145,7 @@ private void tearRamps() {
throw new RuntimeException(e);
}
}
private class CallRegionContext {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it had a squiggle in IntelliJ? Always static all the time.

import java.util.Set;
import java.util.function.IntFunction;

public class MultiPloidyGenotyper<T extends GenotypingEngine> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also consider naming this to be a little more clear. This is not a genotyper this is a ploidy switch for genotypers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was hoping someone else would name it :)


activeRegionGenotypingEngines = new MultiPloidyGenotyper<>(ploidy -> {
// Seems that at least with some test data we can lose genuine haploid variation if we use ploidy == 1
final int adjustedPloidy = Math.max(MINIMUM_PUTATIVE_PLOIDY_FOR_ACTIVE_REGION_DISCOVERY, ploidy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe consider pulling this lambda out to be named and defined somewhere.

@@ -357,6 +334,13 @@ private void initialize(boolean createBamOutIndex, final boolean createBamOutMD5
: null);
}

private HaplotypeCallerGenotypingEngine makeGenotypingEngine(int ploidy) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled this out like you I was asked, I didn't pull out the one for the minimal genotyper because it relied on state from the function around it and it seemed messier to do so. I'm hoping these can be simplified when you put in the copy methods.

* It is lazily initialized so GenotypingEngines are only created on as needed.
* @param <T> the GenotypingEngine implementation this provides
*/
public final class MultiPloidyGenotyperCache<T extends GenotypingEngine> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery Could you re-review this? I rewrote it to be less shitty and added documentation / renamed it.

{"1:10-30", 2},
{"1:70-85", 3},
{"1:80-93", 3},
// {"1:89-92", 2}, TODO is this a bug?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we have a bug. Maybe it's not in the original code but I'm thinking it probably is. If you have a region which contains both default ploidy 2 and a special ploidy 1 region then you end up calling that at ploidy 1 because it's picking the highest of the ALTERNATE ploidies, not the highest of any overlapping ploidy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm you mean the bed file input has a region of ploidy 1 and an overlapping region explicitly listed as ploidy 2 (which happens to be the global default)? And then in that case the ploidy actually becomes 1 in practice?

I'm not sure what the "right" thing I'd want to be here, and not sure it matters so much anyway (since I think we agreed that what happens on the boundary is bound to be a little wacky and semi-unpredictable anyway). I'd be OK rewording the documentation to say "highest of the non-default alternate ploidies," or to try reworking the logic so in this case the ploidy 2 explicitly provided takes precedence.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we poll the regions we are generally polling them on a per-variant basis no? That was my understanding from the baranch. If I have a deletion that overlaps a region that I called ployidy 1 and also overlaps something i haven't changed at all I would very much think the best behavior in that case is to call it as ployidy 1? Am i missing something? This does not seem like a bug to me. We are doing the most specific thing the user has specified in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the case Louis was referring to was a hypothetical user-provided bed file that looked like:

chr1   1000   1700   1
chr1   1500   2000   2

Then maybe the current behavior would be to still use ploidy 1 in the 1500-1700 region even though the "2" overlaps there, because the 2 is the built-in default? Is this correct Louis?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   private int getPloidyToUseAtThisSite(Locatable region) {
       return ploidyRegions.getOverlaps(region)
               .stream()
               .mapToInt(SimpleCount::getCount)
               .max()
               .orElse(defaultPloidy);

Am I wrong in interpreting that in that case this would fall into the max() command? There is a simple-count object for both chr1 1500 2000 2 and chr1 1000 1700 1 and it would fall into 2. The only time default ever matters is if there is nothing specified that covers the site?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ricky and I chatted and it seems like it's working as intended. Used the explicitly specified ploidy if a region overlaps one even if it includes default region which is higher.

@gatk-bot
Copy link

gatk-bot commented Dec 6, 2023

Github actions tests reported job failures from actions build 7120721554
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud 17.0.6+10 7120721554.10 logs
variantcalling 17.0.6+10 7120721554.2 logs

@gatk-bot
Copy link

gatk-bot commented Dec 6, 2023

Github actions tests reported job failures from actions build 7120763686
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud 17.0.6+10 7120763686.10 logs
integration 17.0.6+10 7120763686.11 logs
variantcalling 17.0.6+10 7120763686.2 logs
integration 17.0.6+10 7120763686.0 logs

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache class looks better now. This looks good to me

@lbergelson lbergelson merged commit 4cf1052 into tkay_PARploidy Dec 7, 2023
15 of 20 checks passed
@lbergelson lbergelson deleted the lb_refactor_ricky_stuff branch December 7, 2023 16:38
lbergelson added a commit that referenced this pull request Dec 8, 2023
* Introduced a new class MulitPloidyGenotyperCache which keeps track of multiple GenotypingEngines that have different ploidy values.  
* minor refactoring in related classes
* removed an unused type parameter from GenotypingEngine
lbergelson added a commit that referenced this pull request Dec 8, 2023
* Introduced a new class MulitPloidyGenotyperCache which keeps track of multiple GenotypingEngines that have different ploidy values.  
* minor refactoring in related classes
* removed an unused type parameter from GenotypingEngine
lbergelson added a commit that referenced this pull request Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants