Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created a FuncotatorDataSourceRetriever to download data sources. #5150

Merged
merged 10 commits into from
Sep 10, 2018

Conversation

jonn-smith
Copy link
Collaborator

Created NioFileCopier that copies files using nio paths (includes
optional progress indicator and integrity validation).
Updated an error message in funcotator to make it more descriptive.

Fixes #4549

Created NioFileCopier that copies files using nio paths (includes
    optional progress indicator and integrity validation).
Updated an error message in funcotator to make it more descriptive.

Fixes #4549
@codecov-io
Copy link

codecov-io commented Aug 30, 2018

Codecov Report

Merging #5150 into master will decrease coverage by 0.119%.
The diff coverage is 54.043%.

@@               Coverage Diff               @@
##              master     #5150       +/-   ##
===============================================
- Coverage     86.735%   86.617%   -0.119%     
- Complexity     29312     30109      +797     
===============================================
  Files           1810      1825       +15     
  Lines         135549    138973     +3424     
  Branches       15031     15625      +594     
===============================================
+ Hits          117569    120374     +2805     
- Misses         12566     13088      +522     
- Partials        5414      5511       +97
Impacted Files Coverage Δ Complexity Δ
...dataSources/gencode/GencodeFuncotationFactory.java 83.748% <0%> (-0.123%) 166 <0> (ø)
...ls/nio/NioFileCopierWithProgessMeterTestUtils.java 0% <0%> (ø) 0 <0> (?)
...er/tools/funcotator/FuncotatorIntegrationTest.java 83.473% <0%> (-1.241%) 79 <0> (ø)
...FuncotatorDataSourceDownloaderIntegrationTest.java 1.923% <1.923%> (ø) 1 <1> (?)
...institute/hellbender/exceptions/UserException.java 70.073% <100%> (+0.443%) 3 <0> (ø) ⬇️
...nder/utils/runtime/StreamingProcessController.java 73.575% <100%> (+2.283%) 35 <0> (ø) ⬇️
...nder/tools/funcotator/FuncotatorTestConstants.java 97.222% <100%> (+0.253%) 1 <1> (ø) ⬇️
...NioFileCopierWithProgressMeterResultsUnitTest.java 2.273% <2.273%> (ø) 1 <1> (?)
...ls/nio/NioFileCopierWithProgressMeterUnitTest.java 51.534% <51.534%> (ø) 20 <20> (?)
...institute/hellbender/utils/io/IOUtilsUnitTest.java 79.771% <64.835%> (-7.948%) 40 <8> (+8)
... and 44 more

Copy link
Contributor

@LeeTL1220 LeeTL1220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonn-smith This looks really great as a general purpose tool. It seems like this is general purpose enough to handle bucket-to-bucket, bucket-to-local, local-to-bucket. Am I correct?

Regardless, if tests pass, feel free to merge.

*
* Created by jonn on 8/27/18.
*/
public class NioFileCopier {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if you are doing this to copy from cloud-to-cloud it will be a lot less efficient than using the google storage APIs, as using those APIs doesn't make the data travel through the network to the machine doing the copy. Hopefully this is only used when copying up/down to the cloud.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit here is that you can see the progress of your file transfer.

The built-in API methods do not have progress callbacks, so when you want to transfer a 15Gb file, it takes a long time and the user gets very upset because it appears the program is doing nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When copying cloud to cloud, making every byte being copied go through your local network computer not only makes the operation take at least 2x longer, but it can also incur egress charges if you're not running the operation on a cloud VM.

This seems like a poor tradeoff to just get a progress display. If you're already streaming from the local computer to the cloud or vice versa then those issues are mitigated and the remaining cost is easier to justify.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pshapiro4broad That's true. What do you propose to mitigate this? My specific use case is for a tool that downloads files on the order of 15Gb from our bucket to a user's local running area (wherever that may be).

Are you suggesting that this class should detect when cloud-to-cloud transfers will occur based on the path URL and throw an exception?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was more about the current use case than about future possible uses, I wanted to make sure that the use case that this is being written for isn't cloud-to-cloud.

One fix would be to check the Paths FileSystemProviders. If they're the same, then use Files.copy(Path, Path), otherwise use your code. E.g. call Path.getFileSystem().provider(). Providers are singletons based on scheme so you can use == to compare them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the whole point of the class was to display progress, I'm not going to do additional checking.

Copying a file using NioFileCopierWithProgressMeter should always have the option of displaying a progress meter.

*/
private abstract class SimpleTimeFormatter {

long rawTime_ms;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fields can be final

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@jonn-smith
Copy link
Collaborator Author

@LeeTL1220 - correct. It does bucket/local transfers in any combination. Really, it should handle transfers between any two valid NIO endpoints (though right now only buckets and local are tested in the unit tests).

@droazen
Copy link
Contributor

droazen commented Aug 31, 2018

I've requested a review from @lbergelson on the NIOFileCopier piece.

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a review pass on the tool itself -- back to @jonn-smith with some comments. I've commissioned @lbergelson to review the NIOFileCopier piece, which I did not look at.

//==================================================================================================================
// Private Static Members:

private static String GERMLINE_GCLOUD_DATASOURCES_BASEURL = "gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.4.20180829g";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the correct base version of the data sources (1.4.20180829) be extracted to its own constant, and shared between the URL constants?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

}

@Override
protected void onStartup() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move onStartup() before doWork() to make the tool read more logically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

protected Object doWork() {

// Get the correct data source:
if ( getSomaticDataSources ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bodies of these if statements for the somatic/germline cases are almost identical. Could you extract a common helper method, and call it in both cases?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

}
}

logger.info("IMPORTANT: You must unzip the downloaded data sources prior to using them with Funcotator.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this step for the user from the tool? (perhaps controllable via a command-line toggle)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can add a flag for that.


// verify the hashes are the same:
if ( !downloadedDataSourcesAreValid ) {
throw new GATKException("ERROR: downloaded data sources are corrupt! Unexpected checksum: " + fileCopier.getLatestChecksum() + " != " + expectedSha256Sum);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a UserException -- GATKException is for internal GATK bugs. For things like failed downloads, we blame the user :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

}

private Path getDataSourceLocalPath(final Path dataSourcesPath) {
return IOUtils.getPath(dataSourcesPath.getFileName().toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to allow the user to configure the destination path via an argument?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - that seems reasonable.

}
while ( cleanString.contains("\t") ) {
cleanString = cleanString.substring(0, cleanString.indexOf("\t"));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just call String.trim() here on expectedSha256SumString to get rid of leading/trailing whitespace, and eliminate this method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - the format of the files can either be a raw checksum, or the following:

<CHECKSUM><WHITESPACE><FILENAME>

So I need to make sure only the checksum is used. I can clean this up a little though.

// Helper Methods:

//==================================================================================================================
// Data Providers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get rid of these empty section headings...they just clutter the class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@Test(enabled = doDebugTests,
groups = {"funcotatorValidation", "bucket"}
)
void testGermlineDownloadWithValidation() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test for somatic download as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@droazen In practice will these tests ever be run? Frankly they take so long that I doubt anyone will enable them.

//==================================================================================================================
// Tests:

@Test(enabled = doDebugTests,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this test take to run?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About an hour. I disabled it because it's intractable to run it.

@droazen droazen assigned jonn-smith and unassigned droazen and LeeTL1220 Aug 31, 2018
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonn-smith I have a bunch of suggestions / comments. Feel to pick the ones that make sense to you.

}
}

private void copyLoopWorker() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming makes this sound like it's going to be spun off and run on an executor service. Maybe just call this method copy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

*
* Created by jonn on 8/27/18.
*/
public class NioFileCopier {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to something like "NioFileCopierWithProgressMeter" since the focus of it is on the progress reporting. We don't necessarily want people to use this for general purpose small copying of things because Files.copy() is so much simpler if you don't need progress reporting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

private void logProgressSimple(final double progressValue, final long totalBytesRead, final double bytesPerMillisecond) {

// Get the remaining time estimate:
final long remainingFileSize_bytes = srcFileSize - totalBytesRead;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would factor out a getRemainingDuration method since this code is duplicated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

*
* Created by jonn on 8/27/18.
*/
public class NioFileCopier {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be final unless you're designing it to be extensible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my plan. I don't see any reason someone shouldn't be able to extend off of it.

/**
* @return A copy of the {@link Path} used as the source for this {@link NioFileCopier}.
*/
public Path getSource() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Paths are immutable. I don't think you have to copy them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

public abstract String format();
}

private class AsWordsTimeFormatter extends SimpleTimeFormatter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class is unused.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

private long srcFileSize;
private int srcFileSizeNumDigits;

private Map<ChecksumCalculator, String> checksumMap = new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check sum map seems like an unnecessary complication. Is there a use case for running multiple checksums and running check sums repeatedly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not right now. I'll remove the map and make it a single checksum.

final double progressValue = PROGRESS_DISPLAY_PERCENT_INCREMENT * (Math.floor(Math.abs(rawProgressValuePercent / PROGRESS_DISPLAY_PERCENT_INCREMENT)));

// Output our progress if we're ready for it:
if ( progressValue != lastProgressValue ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this might either update to fast on a fast download or too little on a slow one. You might want to do something like the progress meter in gatktool does and only output every N seconds. (and ultimately we want to make the progress meters async so they update when nothing is happening as well...).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I think I'd like to wait to refactor to timed output until it spins off threads to copy and to log progress.

As an approximation, I'll make it choose a delta % value for which to log based on the file size.

*
* Created by jonn on 8/27/18.
*/
public class NioFileCopier {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general thoughts. This seems sane and useful. A good progress meter is the difference between people being patient and people being angry.

It seems like this class does 3 things, and maybe could be factored differently.

  1. It downloads a file.
  2. It is a progress meter with a lot of formatting code.
  3. It does some checksum validation.

It seems like maybe the formatting and progress reporting could could be pulled into it's own inner class that would keep track of that state.
The validation seems like it should be either baked into the download (i.e. as you download each byte[] update the validation and finalize it at the end). Or just removed from the class and handled externally. I'd vote for doing it in parallel with the download.

That will probably end up with more constructor parameters. It seems like currently this is sort of half a build for configuring this thing, and have a results object. Maybe they should be split into a downloader / validator factory and then a download result that you can query to get the validation results. (or have no return type and just throw if it doesn't validate and you asked for validation).

Copy link
Collaborator Author

@jonn-smith jonn-smith Sep 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... It does all those things.

I'll make a dummy results object that can be used to query results. That may make this a little more straight-forward.

The checksum validation has been baked into the download (if requested).

I would eventually like to refactor this into a threaded tool and pull the progress bar out into another class hierarchy that can include the existing ProgressMeter.

For now I'm going to leave the progress calculations in place.

* If no checksum has been calculated, will return {@code null}.
* @return The last {@link ChecksumCalculator} used to calculate the checksum of {@link #dest}.
*/
public ChecksumCalculator getLatestChecksumCalculator() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "latest" part of this seems strange to me. Maybe it's better to expose a query by digest so you can retrieve multiple checksums. Or just drop to "getChecksum" if multiples aren't supported.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I pulled out the latest checksum business.

Now only 1 checksum is stored.

)
@DocumentedFeature
@BetaFeature
public class FuncotatorDataSourceRetriever extends CommandLineProgram {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think FuncotatorDataSourceDownloader might be a slightly better name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@jonn-smith jonn-smith assigned droazen and unassigned jonn-smith Sep 4, 2018
Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like my first-round comments have all been addressed. I have one final request for test coverage on the zip extraction feature, otherwise the tool itself looks good now.

//==================================================================================================================
// Instance Methods:

private void extractDataSources(final Path localDataSourcesPath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a good unit test for extractDataSource() / extractFilesFromArchiveStream()? You might also want to move extractFilesFromArchiveStream() to IOUtils rather than embed it in this tool.

Alternatively, if you can find a library method in our dependencies (apache, guava, etc.) that does the entire extraction process in a single method call, unit tests won't be necessary.

Copy link
Collaborator Author

@jonn-smith jonn-smith Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move it to IOUtils.

The code I have there is what Apache recommends for doing extraction and leverages some of their classes. I based it on the information they have here (also in the comments): https://commons.apache.org/proper/commons-compress/examples.html

I'll write a unit test for it.

@droazen droazen assigned jonn-smith and unassigned droazen Sep 5, 2018
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few new comments. Looks like positive changes 👍


protected void logProgress(final double progressValue, final long totalBytesRead, final double bytesPerMillisecond) {
if ( isSilent() ) {
logProgressVerbose(progressValue, totalBytesRead, bytesPerMillisecond);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems weird. If it's silent I would expect it to not log progress?

Copy link
Collaborator Author

@jonn-smith jonn-smith Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - that was a typo. 📦

Fixed!

/**
* Output no logging messages whatsoever.
*/
SILENT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the only verbosity level that is checked anywhere is SILENT I think you forgot some if statements somewhere. Or maybe you just need 2 or 3 levels instead of 4.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I never finished the piping.

I want to have 4 log levels - I've updated the code to respect them.

// Now copy from our source to our destination:
doCopy();

// Let the world know the glory that is a complete file copy:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I enjoy this comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad. :)

* Will throw an exception if files exist already.
* @param tarGzFilePath {@link Path} to a gzipped tar file for extraction.
*/
public static void extractTarGz(final Path tarGzFilePath) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems useful if you could specify the destination directory in these methods.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I wanted to avoid the extra piping, but it's worth putting in now.

Fixed.

* @param overwriteExisting If {@code true} will overwrite an existing file in the requested location for the FIFO file. If {@code false} will throw an exception if the file exists.
* @return The {@link File} object pointing to the created FIFO file.
*/
public static File createFifoFile(final String fifoFilePathString, final boolean overwriteExisting) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense for this input to be either a file or a path, we can't create fifo's on exotic uri types anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled this directly from the code in StreamingProcessController where this was sufficient.

I can update it to use Paths.

@jonn-smith jonn-smith assigned droazen and unassigned jonn-smith Sep 6, 2018
@jonn-smith
Copy link
Collaborator Author

@lbergelson @droazen

If you guys want to give it a last look-over, I think it's good to go now.

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to @jonn-smith with some more comments. Found a problem with the validation method in the zip extractor test, plus a few other minor issues.

// Prepare our output directory:
if ( destDir == null ) {
final File tmpDir = createTempDir("IOUtilsUnitTest_testExtractTarGz");
tmpDir.deleteOnExit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createTempDir() calls deleteRecursivelyOnExit() for you, so the manual deleteOnExit() calls are not needed (applies to the other test cases below as well).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but for whatever reason I found that it did not delete the directories at least once when I was testing. This seemed to fix it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take it out and try again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

private void assertContentsTheSame(final Path baseActualPath, final Path baseExpectedPath) {
try {
// Check that the files and directories are the same:
Files.find(baseActualPath, Integer.MAX_VALUE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem: if baseActualPath is missing one or more files that are present in baseExpectedPath (or if baseActualPath is completely empty), this method won't throw an error, since the code in the lambda will never get invoked for those missing files. You need either a counter for actual->expected matches, or a symmetric check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!


// Get the base file path for the output:
final Path baseActualOutputDataSourcesPath = outputDataSourcesPath.getParent().resolve(
baseExpectedOutputDataSourcesPath.getFileName()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't outputDataSourcesPath.getParent().resolve(baseExpectedOutputDataSourcesPath.getFileName()) the same as just outputDataSourcesPath?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - outputDataSourcesPath is the path to the tar.gz file whereas baseActualOutputDataSourcesPath is the path to the extracted folder.

}

@Test(expectedExceptions = UserException.class)
public void testExtractTarGzThrowsException() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give this test case a more specific name like testExtractTarGzThrowsExceptionOnExistingOutput

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


// Create a FIFO file:
final Path fifoFilePath = tmpDirPath.resolve(IOUtils.getPath("FIFOFILE"));
IOUtils.createFifoFile( tmpDirPath.resolve(fifoFilePath) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fifoFilePath was already resolved against tmpDirPath at this point, wasn't it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Fixed.

//==================================================================================================================
// Private Static Members:

private static final boolean doDebugTests = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doDebugTests -> doFullScaleTests (and add comment explaining why they're off by default)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


// Create an output location for the data sources to go:
final File tmpDir = createTempDir("FuncotatorDataSourceDownloaderIntegrationTest_testDownloadDummySmallDataSources");
tmpDir.deleteOnExit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createTempDir() calls deleteRecursivelyOnExit() for you

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

arguments.addBooleanArgument(FuncotatorDataSourceDownloader.EXTRACT_AFTER_DOWNLOAD, doExtract);
arguments.addArgument("verbosity", "INFO");

runCommandLine(arguments);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you assert something after running the command line (like the destination existing, etc.)?

Copy link
Collaborator Author

@jonn-smith jonn-smith Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can't assert that the contents are correct because we shouldn't check in the data sources to git lfs, but I'll add in an existence check for the downloaded file (though it is a little redundant when the integrity check is on).

Fixed.

@droazen droazen assigned jonn-smith and unassigned droazen and lbergelson Sep 6, 2018
@jonn-smith
Copy link
Collaborator Author

jonn-smith commented Sep 7, 2018

The Travis tests are failing because the FIFOFILE that I created in the expected output dir for the tar.gz extraction tests does not seem to exist in git lfs.

@droazen @lbergelson Do either of you know if this is a limitation or if there is some way to fix it? I can make a system call to extract the files for the expected results, but that's gross.

Fifo files do not seem to be handled by git lfs.
@jonn-smith jonn-smith merged commit 575f11d into master Sep 10, 2018
@jonn-smith jonn-smith deleted the jts_funcotator_data_source_retrieval_tool_4549 branch September 10, 2018 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants