-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
array cohort extract #6666
array cohort extract #6666
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay - a test!
looks good to me. once it's merged into my branch we can refactor common code to a common place. let me know if you have strong opinions - or go ahead if you get to it first.
|
||
static { | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this static block a place holder for something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope just junk
static { | ||
} | ||
|
||
private static final Logger logger = LogManager.getLogger(ExtractCohortEngine.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to to match class name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
* This value is used to construct the genotype information of those missing samples | ||
* when they are merged together into a {@link VariantContext} object | ||
*/ | ||
public static int MISSING_CONF_THRESHOLD = 60; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can remove this
|
||
final String contig = probeInfo.contig; | ||
final long position = probeInfo.position; | ||
final Allele refAllele = Allele.create(refSource.queryAndPrefetch(contig, position, position).getBaseString(), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if it would be easier to get the ref allele from the probe_info table - since we already use it. and then we might not need the reference as an input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to think about, although we still need the reference when we initialize the vcf writer to get the sequence dictionary for the header
private void finalizeCurrentVariant(final List<VariantContext> unmergedCalls, final Set<String> currentVariantSamplesSeen, final String contig, final long start, final Allele refAllele) { | ||
|
||
// TODO: this is where we infer missing data points... once we know what we want to drop | ||
// final Set<String> samplesNotEncountered = Sets.difference(sampleNames, currentVariantSamplesSeen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think we are going to be dropping something we aren't already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe... we need to dig into more data, and when we get to imputed data I think we will
lrr = data.lrr; | ||
baf = data.baf; | ||
|
||
// Genotype -- what about no-call? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is no-call the same as missing? there is a enum value for MISSING.
and if you mean how do you represent a missing allele in the genotype, there is a constant for that. i think it's Genotype.NO_CALL
} | ||
} else { | ||
// TODO: constantize | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you can use the constants from the RawArrayFieldEnum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think these might all go away once we're storing compressed data so I didn't think too much about it
// Get the query string: | ||
final String sampleListQueryString = | ||
"SELECT probeId, Name, Chr, Position, Ref, AlleleA, AlleleB" + | ||
" FROM `" + fqProbeTableName + "`"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could add a constant in ProbeInfoSchema for the list of field to get for extract. still need to figure out a package structure that makes sense since right now that class is in ingest
this.alleleA = alleleA; | ||
this.alleleB = alleleB; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can merge the ProbeInfo class under ingest with this one
// Order is critical here, the ordinal is the int encoding | ||
AA,AB, BB, NO_CALL | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, i think we should merge this with the enum in the ingest code (i can do this after the PR is merged)
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
* array cohort extract * roundtripping with binary compression * PR comments
No description provided.