-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avro test #7192
Avro test #7192
Conversation
src/main/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractCohort.java
Show resolved
Hide resolved
@@ -0,0 +1,100 @@ | |||
30,HG00561 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably could use just a few samples, and use a subsetted interval list, so that the test files can be small. there's a small interval list in the CreateVariantIngestFiles integration test resources, if that helps!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please! Just take a few ( < 5 ) samples and just a few sites (100s)
public class ExtractCohortEngineTest extends CommandLineProgramTest { | ||
|
||
private static final String cohortAvroFileName = // is it okay to just grab one? | ||
"src/test/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/AnVIL_WGS_100_exported_cohort/000000000000"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can use getToolTestDataDir()
to get the path
897682f
to
988930a
Compare
One concern I have is the maintainability of the test (having been burned by this in other places myself). When we add a new output field, etc we need a very easy way to update/generate these results. At the very least some instructions would be helpful (and imagine someone to follow those as part of a PR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Some small comments, and I think we shouldn't have a 23 MB test file, but other than that 👍
// if there is a avro file, the BQ specific parameters are unnecessary, | ||
// but they all are required if there is no avro file | ||
if (cohortAvroFileName == null && (projectID == null || cohortTable == null)) { | ||
throw new UserException("a project id and cohort table are required if no avro file is provided"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super small, but for ease of the user, could you provide the argument flags corresponding to project id and cohort table? e.g. Project id (--project-id) and cohort table (--cohort-extract-table) are required if no avro file is provided.
|
||
class ExtractCohortTest extends CommandLineProgramTest { | ||
private final String prefix = getToolTestDataDir(); | ||
private final String cohortAvroFileName = prefix +"000000000000.avro"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is 22.8 MB - can we make it smaller?
) | ||
private String cohortTable = null; | ||
|
||
@Argument( | ||
fullName = "cohort-avro-file-name", | ||
doc = "Path of the cohort avro file", | ||
mutex={"cohort-extract-table"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm so sorry to be this person but can you put spaces before and after the =
on all these mutex sets so it's consistent?
src/main/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractCohort.java
Show resolved
Hide resolved
// but they all are required if there is no avro file | ||
if (cohortAvroFileName == null && (projectID == null || cohortTable == null)) { | ||
throw new UserException("Project id (--project-id) " + | ||
"and cohort table (--cohort-extract-table) are required if no avro file is provided."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is awesome -- could you also add a similar check for for the sample-file, sample-table parameters? I think it's the same logic (if you don't give a sample file, you need both a sample-table and a project-id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great Rori! Just make sure that after removing output.vcf from your commit/PR that the test still pass.
// but they all are required if there is no avro file | ||
if (cohortAvroFileName == null && (projectID == null || cohortTable == null)) { | ||
throw new UserException("Project id (--project-id) " + | ||
"and cohort table (--cohort-extract-table) are required if no avro file is provided."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is awesome -- could you also add a similar check for for the sample-file, sample-table parameters? I think it's the same logic (if you don't give a sample file, you need both a sample-table and a project-id)
|
||
@Test | ||
public void testFinalVCFfromAvro() throws Exception { | ||
// To create the expected output file--create a temp table in BQ with the folowing query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome
@@ -0,0 +1,3 @@ | |||
2,HG00408 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can this be sorted?
runCommandLine(args); | ||
|
||
final File expectedVCF = getTestFile("expected.vcf"); | ||
final File outputVCF = getTestFile("output.vcf"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to get the output.vcf
that you checked in as part of this commit? It should be the output of the run above (which I think won't live in the same place as getTestFile(). You can tell for sure by removing src/test/resources/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractCohort/output.vcf
from your PR (which I think shouldn't be commited anyway).
.add("sample-file", sampleFile); | ||
|
||
runCommandLine(args); | ||
IntegrationTestSpec.assertEqualTextFiles(outputVCF, expectedVCF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you also check the index file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels like it would be outside the scope of the test no? a GATK issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, totally - i just wasn't sure why you were checking in the expected index file in that case
No description provided.