Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor out a GATKBaseTest for separate test resources from test utilities in BaseTest #3475

Merged

Conversation

magicDGS
Copy link
Contributor

@magicDGS magicDGS commented Aug 22, 2017

Changes to the testing framework to remove references to the test resources, keeping them into the src/test package. This changes include:

  • Factor out a GATKBaseTest for separate test resources from test utilities in BaseTest
  • Remove duplicated CleanSamIntegrationTest
  • Repackage CommandLineProgramTest to be in the test sources, and use it's interface in testers
  • Move some testers to the src/test package because they are tool-specific (added TODO to other ones that aren't that clear)
  • Refactor TargetsToolsTestUtils to use a provided reference

Closes #3029
Closes #2125

@magicDGS
Copy link
Contributor Author

Here is a new PR for the issues related with the test code. Back to you @droazen.

@droazen droazen self-assigned this Aug 22, 2017
@droazen droazen self-requested a review August 22, 2017 15:45
@magicDGS
Copy link
Contributor Author

@droazen, I updated the PR message to describe the changes, because I tried to get all the code referring to test resources packaged in src/test. Now it should pass the tests too.

@codecov-io
Copy link

codecov-io commented Aug 23, 2017

Codecov Report

Merging #3475 into master will decrease coverage by 0.54%.
The diff coverage is 85.714%.

@@              Coverage Diff               @@
##              master     #3475      +/-   ##
==============================================
- Coverage     79.554%   79.014%   -0.54%     
+ Complexity     17738     17588     -150     
==============================================
  Files           1154      1151       -3     
  Lines          64092     63666     -426     
  Branches        9757      9748       -9     
==============================================
- Hits           50988     50305     -683     
- Misses          9214      9486     +272     
+ Partials        3890      3875      -15
Impacted Files Coverage Δ Complexity Δ
...broadinstitute/hellbender/utils/test/BaseTest.java 70.707% <ø> (-13.116%) 27 <0> (-9)
...e/hellbender/utils/test/testers/SamFileTester.java 90.217% <ø> (ø) 30 <0> (ø) ⬇️
.../hellbender/utils/test/testers/CleanSamTester.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...llbender/utils/test/SimpleIntervalTestFactory.java 100% <100%> (ø) 5 <2> (?)
...ute/hellbender/utils/test/GenomicsDBTestUtils.java 91.667% <100%> (-0.641%) 3 <0> (ø)
...ils/test/testers/AbstractMarkDuplicatesTester.java 79.487% <100%> (ø) 17 <1> (ø) ⬇️
...s/spark/ParallelCopyGCSDirectoryIntoHDFSSpark.java 0% <0%> (-75.51%) 0% <0%> (-17%)
...nder/tools/spark/pipelines/PrintVariantsSpark.java 0% <0%> (-66.667%) 0% <0%> (-2%)
...institute/hellbender/utils/gcs/GATKGCSOptions.java 0% <0%> (-66.667%) 0% <0%> (ø)
...lbender/engine/datasources/ReferenceAPISource.java 22.013% <0%> (-62.264%) 8% <0%> (-26%)
... and 16 more

@magicDGS magicDGS force-pushed the dgs_refactor_basetest branch from 984696e to ee1d7f2 Compare September 15, 2017 13:09
@droazen droazen requested review from jamesemery and removed request for droazen October 23, 2017 15:25
@droazen droazen assigned jamesemery and unassigned droazen Oct 23, 2017
@jamesemery
Copy link
Collaborator

@magicDGS Could you rebase this and update newer references to baseTest that have been added since you created this branch?

@magicDGS magicDGS force-pushed the dgs_refactor_basetest branch from ee1d7f2 to bede8ef Compare October 24, 2017 09:57
@magicDGS
Copy link
Contributor Author

@jamesemery - I think that the rebase is done. I'd like to have this in as soon as it can be, to avoid the extra-work of rebasing due to new tests or refactoring of them.... Thank you in advance!

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple things you need to change here, and before this gets merged we should make sure to let people know that this will probably cause merge conflicts in their test methods.

locs.add(hg19GenomeLocParser.parseGenomeLoc(interval));
return Collections.unmodifiableList(locs);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For purity sake I think getToolTestDataDir() should be left abstract in BaseTest and instantiated in GATKBaseTest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be overridden, but I think that it is a good practice to have a standard folder structure for test data directories. In addition, this is not project specific, because the java source structure (src/test/resources) is more or less an standard.

No action for this comment.

public static File getTestDataDir(){
return new File(CommandLineProgramTest.getTestDataDir(),"exome");
/** Initialize the reference file and dictionary to use for creating intervals. */
public TargetsToolsTestUtils(final File referenceFile){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be the case that you are ever instantiating a utils class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you are trying to avoid calling = new File(getTestDataDir(),"test_reference.fasta"), create a getter which farms out to GATKBaseTest getTestDataDir() to get these files and leave it to external tool authors to avoid using these files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically, refactor the other methods in this class to take the reference dictionary as an argument and have the REFERENCE_FILE just be a string pointer to the file. Utils classes should not have state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was before an utility class to create SimpleIntervals with the reference in the exome test source. It is true that it is not a utils class, but a factory/builder class.

I think that I prefer in this case to hold the dictionary and the refererence to be sure that it correspond to the same one. I renamed to SimpleIntervalTestFactory to be clear that it keeps state and it is a factory.

Copy link
Collaborator

@jamesemery jamesemery Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@magicDGS Thinking about it, this still doesn't seem right, the methods contiained in this class fall under the purview of what IntervalUtils is doing. Its probably better to factor this class out entirely and move its methods over IntervalUtils as static and enforce that they take a referenceDictionary as an argument. Then for all the tests where you instantiate this class just instantiate a dictionary instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we left this to a different PR? The main problem with this issue is the BaseTest and GATKBaseTest, and removing this class by adding new methods to IntervalUtils will delay even more the reviewing process...

I open an issue to remove it (#3771) and I promise to prepare a PR for it once this is in...

@@ -77,7 +72,8 @@ public static SimpleInterval createOverEntireContig(final String contig) {
* @return never {@code null}.
* @throws UserException if there was some problem when creating the location.
*/
public static SimpleInterval createInterval(final String contig, final int start) {
public SimpleInterval createInterval(final String contig, final int start) {
// TODO: should this really be createInterval(contig, start, start) instead of using the constructor supplied here?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, this method is kindof pointless, it seems to only be called in one place and furthermore it seems to be used in one place, and given the the other overload it seems confusing to just use SimpleInterval

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the TODO and call the createInterval(contig, start, start) instead, to be sure that it is using the sequence dictionary.

/**
* This class is the extension of the SamFileTester to test CleanSam with SAM files generated on the fly.
*/
public final class CleanSamIntegrationTest extends SamFileTester {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch on the duplicated code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@@ -50,7 +53,7 @@ public File getMetricsFile() {
}

@Override
public String getTestedClassName() { return getProgram().getClass().getSimpleName(); }
public final String getTestedToolName() { return getProgram().getClass().getSimpleName(); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you make this change in a few places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This classes were implementing CommandLineProgramTest, which is the CommandLineProgramTester implementation for GATK (extending also GATKBaseTest).

Because this classes are in the testing framework, and they are only tester classes, they do not inherit from BaseTest anymore. The method for returning the tool name is called getTestedToolName in CommandLineProgramTester and thus here it should be the one overridden.

Thus, this is one of the results of finally separating the gatk-testing framework (for re-use with downstream projects) and the test from GATK.

@@ -19,6 +19,9 @@
* This class is an extension of SamFileTester used to test AbstractMarkDuplicatesCommandLineProgram's with SAM files generated on the fly.
* This performs the underlying tests defined by classes such as AbstractMarkDuplicatesCommandLineProgramTest.
*/
// TODO: is this really necessary for the packaged testing framework?
// TODO: it looks like that this is a SamFileTester exclusive for MarkDuplicates with the parameter in GATK/Picard
// TODO: and thus, it should live in the test resources
public abstract class AbstractMarkDuplicatesTester extends SamFileTester {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, though I would leave it for a separate pull request to make that change for the testers more broadly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the TODO to a clear procedure and open a new ticket for move it to the test sources (#3762)

@magicDGS
Copy link
Contributor Author

@jamesemery - we should get this merge as soon as possible to avoid conflicts that pop up in every round of comments. Once this is in, I can go to the open PRs to point out the conflicts and the new structure (e.g., change the new tests to extend GATKBaseTest).

I added a new commit addressing the issues and I will rebase to resolve conflicts again.

@magicDGS magicDGS force-pushed the dgs_refactor_basetest branch from 6ba1f25 to 7a02606 Compare October 31, 2017 14:49
@jamesemery
Copy link
Collaborator

@magicDGS You need to resolve the conflicts yet again and respond to the comments I made about the SimpleIntervalTestFactory then this could probably be merged

@magicDGS
Copy link
Contributor Author

magicDGS commented Nov 1, 2017

@jamesemery - I rebased to solve the conflict and I open an issue regarding the SimpleIntervalTestFactory to do not block this work any longer, because this is already a big change in the testing framework.

Because the changes are big and we are working in different timezones, conflicts pop up everyday when another PR is accepted before this if they modify any of the test files (which is often the case). If we continue delaying this, it would never be possible to merge...

@jamesemery
Copy link
Collaborator

@magicDGS It looks like you have triggered a few new compiler errors in the last branch, namely in the following places:

/gatk/src/test/java/org/broadinstitute/hellbender/tools/spark/sv/discovery/SVDiscoveryTestDataProvider.java:33: error: cannot find symbol
            BaseTest.b38_reference_20_21, ReferenceWindowFunctions.IDENTITY_FUNCTION);
            ^
  symbol:   variable BaseTest
  location: class SVDiscoveryTestDataProvider
/gatk/src/test/java/org/broadinstitute/hellbender/tools/copynumber/formats/SampleLocatableCollectionUnitTest.java:30: error: cannot find symbol
    private static final String TEST_SUB_DIR = toolsTestDir + "copynumber/formats";
                                               ^
  symbol:   variable toolsTestDir
  location: class SampleLocatableCollectionUnitTest
/gatk/src/test/java/org/broadinstitute/hellbender/tools/copynumber/utils/annotatedregion/SimpleAnnotatedGenomicRegionUnitTest.java:18: error: cannot find symbol
    private static final String TEST_FILE = publicTestDir + "org/broadinstitute/hellbender/tools/copynumber/utils/combine-segment-breakpoints-with-legacy-header-learning-combined-copy-number.tsv";

@jamesemery
Copy link
Collaborator

@magicDGS Once these conflicts are resolved (which appear to be quick fixes) I will merge this branch.

@magicDGS
Copy link
Contributor Author

magicDGS commented Nov 1, 2017

Thanks @jamesemery - that's the complication of this big PR. I hope that the tests pass after my last commit and that we can get this in before another PR gets in.

Thanks a lot for reviewing!

@lbergelson lbergelson dismissed jamesemery’s stale review November 1, 2017 19:38

Conflicts are resolved, tests are passing.

@lbergelson lbergelson merged commit 9283269 into broadinstitute:master Nov 1, 2017
@lbergelson
Copy link
Member

@magicDGS merged! Thank you for your this.

@magicDGS magicDGS deleted the dgs_refactor_basetest branch November 1, 2017 20:28
@magicDGS magicDGS mentioned this pull request Nov 2, 2017
@magicDGS
Copy link
Contributor Author

magicDGS commented Nov 2, 2017

Thanks to all of you!

@magicDGS
Copy link
Contributor Author

magicDGS commented Nov 2, 2017

I tried to warn in every affected open PR as promised @jamesemery

jonn-smith pushed a commit that referenced this pull request Nov 27, 2017
… (#3475)

Projects depending on GATK have had trouble using BaseTest because it tries to load specific files from the gatk test resources.  If the project didn't include these files, the tests would crash.  This separates out a new GATK only GATKBaseTest which contains all of the file references but leaves the useful utilities in BaseTest for downstream use.

It makes similar changes in other test classes that had related issues.
fixes #2125
fixes #3029
Introduces #3771 but due to the pain of rebasing the entire test suite repeatedly, this will be addressed in a follow up PR.

* Factor out a GATKBaseTest for separate test resources from test utilities in BaseTest
* Repackage CommandLineProgramTest due to project-specific paths
* Refactor TargetsToolsTestUtils to allow downstream projects to use with custom references
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants