-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factor out a GATKBaseTest for separate test resources from test utilities in BaseTest #3475
Factor out a GATKBaseTest for separate test resources from test utilities in BaseTest #3475
Conversation
Here is a new PR for the issues related with the test code. Back to you @droazen. |
@droazen, I updated the PR message to describe the changes, because I tried to get all the code referring to test resources packaged in src/test. Now it should pass the tests too. |
Codecov Report
@@ Coverage Diff @@
## master #3475 +/- ##
==============================================
- Coverage 79.554% 79.014% -0.54%
+ Complexity 17738 17588 -150
==============================================
Files 1154 1151 -3
Lines 64092 63666 -426
Branches 9757 9748 -9
==============================================
- Hits 50988 50305 -683
- Misses 9214 9486 +272
+ Partials 3890 3875 -15
|
984696e
to
ee1d7f2
Compare
@magicDGS Could you rebase this and update newer references to baseTest that have been added since you created this branch? |
ee1d7f2
to
bede8ef
Compare
@jamesemery - I think that the rebase is done. I'd like to have this in as soon as it can be, to avoid the extra-work of rebasing due to new tests or refactoring of them.... Thank you in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple things you need to change here, and before this gets merged we should make sure to let people know that this will probably cause merge conflicts in their test methods.
locs.add(hg19GenomeLocParser.parseGenomeLoc(interval)); | ||
return Collections.unmodifiableList(locs); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For purity sake I think getToolTestDataDir() should be left abstract in BaseTest and instantiated in GATKBaseTest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be overridden, but I think that it is a good practice to have a standard folder structure for test data directories. In addition, this is not project specific, because the java source structure (src/test/resources
) is more or less an standard.
No action for this comment.
public static File getTestDataDir(){ | ||
return new File(CommandLineProgramTest.getTestDataDir(),"exome"); | ||
/** Initialize the reference file and dictionary to use for creating intervals. */ | ||
public TargetsToolsTestUtils(final File referenceFile){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not be the case that you are ever instantiating a utils class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you are trying to avoid calling = new File(getTestDataDir(),"test_reference.fasta"), create a getter which farms out to GATKBaseTest getTestDataDir() to get these files and leave it to external tool authors to avoid using these files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More specifically, refactor the other methods in this class to take the reference dictionary as an argument and have the REFERENCE_FILE just be a string pointer to the file. Utils classes should not have state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was before an utility class to create SimpleIntervals
with the reference in the exome test source. It is true that it is not a utils class, but a factory/builder class.
I think that I prefer in this case to hold the dictionary and the refererence to be sure that it correspond to the same one. I renamed to SimpleIntervalTestFactory
to be clear that it keeps state and it is a factory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@magicDGS Thinking about it, this still doesn't seem right, the methods contiained in this class fall under the purview of what IntervalUtils is doing. Its probably better to factor this class out entirely and move its methods over IntervalUtils as static and enforce that they take a referenceDictionary as an argument. Then for all the tests where you instantiate this class just instantiate a dictionary instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we left this to a different PR? The main problem with this issue is the BaseTest
and GATKBaseTest
, and removing this class by adding new methods to IntervalUtils
will delay even more the reviewing process...
I open an issue to remove it (#3771) and I promise to prepare a PR for it once this is in...
@@ -77,7 +72,8 @@ public static SimpleInterval createOverEntireContig(final String contig) { | |||
* @return never {@code null}. | |||
* @throws UserException if there was some problem when creating the location. | |||
*/ | |||
public static SimpleInterval createInterval(final String contig, final int start) { | |||
public SimpleInterval createInterval(final String contig, final int start) { | |||
// TODO: should this really be createInterval(contig, start, start) instead of using the constructor supplied here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, this method is kindof pointless, it seems to only be called in one place and furthermore it seems to be used in one place, and given the the other overload it seems confusing to just use SimpleInterval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the TODO and call the createInterval(contig, start, start)
instead, to be sure that it is using the sequence dictionary.
/** | ||
* This class is the extension of the SamFileTester to test CleanSam with SAM files generated on the fly. | ||
*/ | ||
public final class CleanSamIntegrationTest extends SamFileTester { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch on the duplicated code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@@ -50,7 +53,7 @@ public File getMetricsFile() { | |||
} | |||
|
|||
@Override | |||
public String getTestedClassName() { return getProgram().getClass().getSimpleName(); } | |||
public final String getTestedToolName() { return getProgram().getClass().getSimpleName(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you make this change in a few places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This classes were implementing CommandLineProgramTest
, which is the CommandLineProgramTester
implementation for GATK (extending also GATKBaseTest
).
Because this classes are in the testing framework, and they are only tester classes, they do not inherit from BaseTest
anymore. The method for returning the tool name is called getTestedToolName
in CommandLineProgramTester
and thus here it should be the one overridden.
Thus, this is one of the results of finally separating the gatk-testing framework (for re-use with downstream projects) and the test from GATK.
@@ -19,6 +19,9 @@ | |||
* This class is an extension of SamFileTester used to test AbstractMarkDuplicatesCommandLineProgram's with SAM files generated on the fly. | |||
* This performs the underlying tests defined by classes such as AbstractMarkDuplicatesCommandLineProgramTest. | |||
*/ | |||
// TODO: is this really necessary for the packaged testing framework? | |||
// TODO: it looks like that this is a SamFileTester exclusive for MarkDuplicates with the parameter in GATK/Picard | |||
// TODO: and thus, it should live in the test resources | |||
public abstract class AbstractMarkDuplicatesTester extends SamFileTester { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, though I would leave it for a separate pull request to make that change for the testers more broadly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the TODO to a clear procedure and open a new ticket for move it to the test sources (#3762)
@jamesemery - we should get this merge as soon as possible to avoid conflicts that pop up in every round of comments. Once this is in, I can go to the open PRs to point out the conflicts and the new structure (e.g., change the new tests to extend I added a new commit addressing the issues and I will rebase to resolve conflicts again. |
6ba1f25
to
7a02606
Compare
@magicDGS You need to resolve the conflicts yet again and respond to the comments I made about the |
…th custom references
7a02606
to
dddffd0
Compare
@jamesemery - I rebased to solve the conflict and I open an issue regarding the Because the changes are big and we are working in different timezones, conflicts pop up everyday when another PR is accepted before this if they modify any of the test files (which is often the case). If we continue delaying this, it would never be possible to merge... |
@magicDGS It looks like you have triggered a few new compiler errors in the last branch, namely in the following places:
|
@magicDGS Once these conflicts are resolved (which appear to be quick fixes) I will merge this branch. |
Thanks @jamesemery - that's the complication of this big PR. I hope that the tests pass after my last commit and that we can get this in before another PR gets in. Thanks a lot for reviewing! |
Conflicts are resolved, tests are passing.
@magicDGS merged! Thank you for your this. |
Thanks to all of you! |
I tried to warn in every affected open PR as promised @jamesemery |
… (#3475) Projects depending on GATK have had trouble using BaseTest because it tries to load specific files from the gatk test resources. If the project didn't include these files, the tests would crash. This separates out a new GATK only GATKBaseTest which contains all of the file references but leaves the useful utilities in BaseTest for downstream use. It makes similar changes in other test classes that had related issues. fixes #2125 fixes #3029 Introduces #3771 but due to the pain of rebasing the entire test suite repeatedly, this will be addressed in a follow up PR. * Factor out a GATKBaseTest for separate test resources from test utilities in BaseTest * Repackage CommandLineProgramTest due to project-specific paths * Refactor TargetsToolsTestUtils to allow downstream projects to use with custom references
Changes to the testing framework to remove references to the test resources, keeping them into the src/test package. This changes include:
GATKBaseTest
for separate test resources from test utilities inBaseTest
CleanSamIntegrationTest
CommandLineProgramTest
to be in the test sources, and use it's interface in testersTargetsToolsTestUtils
to use a provided referenceCloses #3029
Closes #2125