-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created a FuncotatorDataSourceRetriever to download data sources. #5150
Created a FuncotatorDataSourceRetriever to download data sources. #5150
Conversation
Created NioFileCopier that copies files using nio paths (includes optional progress indicator and integrity validation). Updated an error message in funcotator to make it more descriptive. Fixes #4549
Codecov Report
@@ Coverage Diff @@
## master #5150 +/- ##
===============================================
- Coverage 86.735% 86.617% -0.119%
- Complexity 29312 30109 +797
===============================================
Files 1810 1825 +15
Lines 135549 138973 +3424
Branches 15031 15625 +594
===============================================
+ Hits 117569 120374 +2805
- Misses 12566 13088 +522
- Partials 5414 5511 +97
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonn-smith This looks really great as a general purpose tool. It seems like this is general purpose enough to handle bucket-to-bucket, bucket-to-local, local-to-bucket. Am I correct?
Regardless, if tests pass, feel free to merge.
* | ||
* Created by jonn on 8/27/18. | ||
*/ | ||
public class NioFileCopier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if you are doing this to copy from cloud-to-cloud it will be a lot less efficient than using the google storage APIs, as using those APIs doesn't make the data travel through the network to the machine doing the copy. Hopefully this is only used when copying up/down to the cloud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit here is that you can see the progress of your file transfer.
The built-in API methods do not have progress callbacks, so when you want to transfer a 15Gb file, it takes a long time and the user gets very upset because it appears the program is doing nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When copying cloud to cloud, making every byte being copied go through your local network computer not only makes the operation take at least 2x longer, but it can also incur egress charges if you're not running the operation on a cloud VM.
This seems like a poor tradeoff to just get a progress display. If you're already streaming from the local computer to the cloud or vice versa then those issues are mitigated and the remaining cost is easier to justify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pshapiro4broad That's true. What do you propose to mitigate this? My specific use case is for a tool that downloads files on the order of 15Gb from our bucket to a user's local running area (wherever that may be).
Are you suggesting that this class should detect when cloud-to-cloud transfers will occur based on the path URL and throw an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was more about the current use case than about future possible uses, I wanted to make sure that the use case that this is being written for isn't cloud-to-cloud.
One fix would be to check the Path
s FileSystemProvider
s. If they're the same, then use Files.copy(Path, Path)
, otherwise use your code. E.g. call Path.getFileSystem().provider()
. Providers are singletons based on scheme so you can use ==
to compare them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the whole point of the class was to display progress, I'm not going to do additional checking.
Copying a file using NioFileCopierWithProgressMeter
should always have the option of displaying a progress meter.
*/ | ||
private abstract class SimpleTimeFormatter { | ||
|
||
long rawTime_ms; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fields can be final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@LeeTL1220 - correct. It does bucket/local transfers in any combination. Really, it should handle transfers between any two valid NIO endpoints (though right now only buckets and local are tested in the unit tests). |
I've requested a review from @lbergelson on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a review pass on the tool itself -- back to @jonn-smith with some comments. I've commissioned @lbergelson to review the NIOFileCopier
piece, which I did not look at.
//================================================================================================================== | ||
// Private Static Members: | ||
|
||
private static String GERMLINE_GCLOUD_DATASOURCES_BASEURL = "gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.4.20180829g"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the correct base version of the data sources (1.4.20180829
) be extracted to its own constant, and shared between the URL constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
} | ||
|
||
@Override | ||
protected void onStartup() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move onStartup()
before doWork()
to make the tool read more logically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
protected Object doWork() { | ||
|
||
// Get the correct data source: | ||
if ( getSomaticDataSources ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bodies of these if statements for the somatic/germline cases are almost identical. Could you extract a common helper method, and call it in both cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
} | ||
} | ||
|
||
logger.info("IMPORTANT: You must unzip the downloaded data sources prior to using them with Funcotator."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this step for the user from the tool? (perhaps controllable via a command-line toggle)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I can add a flag for that.
|
||
// verify the hashes are the same: | ||
if ( !downloadedDataSourcesAreValid ) { | ||
throw new GATKException("ERROR: downloaded data sources are corrupt! Unexpected checksum: " + fileCopier.getLatestChecksum() + " != " + expectedSha256Sum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a UserException
-- GATKException
is for internal GATK bugs. For things like failed downloads, we blame the user :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
} | ||
|
||
private Path getDataSourceLocalPath(final Path dataSourcesPath) { | ||
return IOUtils.getPath(dataSourcesPath.getFileName().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to allow the user to configure the destination path via an argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - that seems reasonable.
} | ||
while ( cleanString.contains("\t") ) { | ||
cleanString = cleanString.substring(0, cleanString.indexOf("\t")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just call String.trim()
here on expectedSha256SumString
to get rid of leading/trailing whitespace, and eliminate this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - the format of the files can either be a raw checksum, or the following:
<CHECKSUM><WHITESPACE><FILENAME>
So I need to make sure only the checksum is used. I can clean this up a little though.
// Helper Methods: | ||
|
||
//================================================================================================================== | ||
// Data Providers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get rid of these empty section headings...they just clutter the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@Test(enabled = doDebugTests, | ||
groups = {"funcotatorValidation", "bucket"} | ||
) | ||
void testGermlineDownloadWithValidation() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add test for somatic download as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
@droazen In practice will these tests ever be run? Frankly they take so long that I doubt anyone will enable them.
//================================================================================================================== | ||
// Tests: | ||
|
||
@Test(enabled = doDebugTests, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does this test take to run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About an hour. I disabled it because it's intractable to run it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonn-smith I have a bunch of suggestions / comments. Feel to pick the ones that make sense to you.
} | ||
} | ||
|
||
private void copyLoopWorker() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming makes this sound like it's going to be spun off and run on an executor service. Maybe just call this method copy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
* | ||
* Created by jonn on 8/27/18. | ||
*/ | ||
public class NioFileCopier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this to something like "NioFileCopierWithProgressMeter" since the focus of it is on the progress reporting. We don't necessarily want people to use this for general purpose small copying of things because Files.copy() is so much simpler if you don't need progress reporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
private void logProgressSimple(final double progressValue, final long totalBytesRead, final double bytesPerMillisecond) { | ||
|
||
// Get the remaining time estimate: | ||
final long remainingFileSize_bytes = srcFileSize - totalBytesRead; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would factor out a getRemainingDuration
method since this code is duplicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
* | ||
* Created by jonn on 8/27/18. | ||
*/ | ||
public class NioFileCopier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be final unless you're designing it to be extensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my plan. I don't see any reason someone shouldn't be able to extend off of it.
/** | ||
* @return A copy of the {@link Path} used as the source for this {@link NioFileCopier}. | ||
*/ | ||
public Path getSource() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Paths are immutable. I don't think you have to copy them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
public abstract String format(); | ||
} | ||
|
||
private class AsWordsTimeFormatter extends SimpleTimeFormatter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this class is unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
private long srcFileSize; | ||
private int srcFileSizeNumDigits; | ||
|
||
private Map<ChecksumCalculator, String> checksumMap = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check sum map seems like an unnecessary complication. Is there a use case for running multiple checksums and running check sums repeatedly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not right now. I'll remove the map and make it a single checksum.
final double progressValue = PROGRESS_DISPLAY_PERCENT_INCREMENT * (Math.floor(Math.abs(rawProgressValuePercent / PROGRESS_DISPLAY_PERCENT_INCREMENT))); | ||
|
||
// Output our progress if we're ready for it: | ||
if ( progressValue != lastProgressValue ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this might either update to fast on a fast download or too little on a slow one. You might want to do something like the progress meter in gatktool does and only output every N seconds. (and ultimately we want to make the progress meters async so they update when nothing is happening as well...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. I think I'd like to wait to refactor to timed output until it spins off threads to copy and to log progress.
As an approximation, I'll make it choose a delta % value for which to log based on the file size.
* | ||
* Created by jonn on 8/27/18. | ||
*/ | ||
public class NioFileCopier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general thoughts. This seems sane and useful. A good progress meter is the difference between people being patient and people being angry.
It seems like this class does 3 things, and maybe could be factored differently.
- It downloads a file.
- It is a progress meter with a lot of formatting code.
- It does some checksum validation.
It seems like maybe the formatting and progress reporting could could be pulled into it's own inner class that would keep track of that state.
The validation seems like it should be either baked into the download (i.e. as you download each byte[] update the validation and finalize it at the end). Or just removed from the class and handled externally. I'd vote for doing it in parallel with the download.
That will probably end up with more constructor parameters. It seems like currently this is sort of half a build for configuring this thing, and have a results object. Maybe they should be split into a downloader / validator factory and then a download result that you can query to get the validation results. (or have no return type and just throw if it doesn't validate and you asked for validation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... It does all those things.
I'll make a dummy results object that can be used to query results. That may make this a little more straight-forward.
The checksum validation has been baked into the download (if requested).
I would eventually like to refactor this into a threaded tool and pull the progress bar out into another class hierarchy that can include the existing ProgressMeter
.
For now I'm going to leave the progress calculations in place.
* If no checksum has been calculated, will return {@code null}. | ||
* @return The last {@link ChecksumCalculator} used to calculate the checksum of {@link #dest}. | ||
*/ | ||
public ChecksumCalculator getLatestChecksumCalculator() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "latest" part of this seems strange to me. Maybe it's better to expose a query by digest so you can retrieve multiple checksums. Or just drop to "getChecksum" if multiples aren't supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I pulled out the latest checksum business.
Now only 1 checksum is stored.
) | ||
@DocumentedFeature | ||
@BetaFeature | ||
public class FuncotatorDataSourceRetriever extends CommandLineProgram { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think FuncotatorDataSourceDownloader
might be a slightly better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like my first-round comments have all been addressed. I have one final request for test coverage on the zip extraction feature, otherwise the tool itself looks good now.
//================================================================================================================== | ||
// Instance Methods: | ||
|
||
private void extractDataSources(final Path localDataSourcesPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a good unit test for extractDataSource()
/ extractFilesFromArchiveStream()
? You might also want to move extractFilesFromArchiveStream()
to IOUtils
rather than embed it in this tool.
Alternatively, if you can find a library method in our dependencies (apache, guava, etc.) that does the entire extraction process in a single method call, unit tests won't be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move it to IOUtils.
The code I have there is what Apache recommends for doing extraction and leverages some of their classes. I based it on the information they have here (also in the comments): https://commons.apache.org/proper/commons-compress/examples.html
I'll write a unit test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few new comments. Looks like positive changes 👍
|
||
protected void logProgress(final double progressValue, final long totalBytesRead, final double bytesPerMillisecond) { | ||
if ( isSilent() ) { | ||
logProgressVerbose(progressValue, totalBytesRead, bytesPerMillisecond); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems weird. If it's silent I would expect it to not log progress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - that was a typo. 📦
Fixed!
/** | ||
* Output no logging messages whatsoever. | ||
*/ | ||
SILENT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the only verbosity level that is checked anywhere is SILENT
I think you forgot some if statements somewhere. Or maybe you just need 2 or 3 levels instead of 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I never finished the piping.
I want to have 4 log levels - I've updated the code to respect them.
// Now copy from our source to our destination: | ||
doCopy(); | ||
|
||
// Let the world know the glory that is a complete file copy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I enjoy this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad. :)
* Will throw an exception if files exist already. | ||
* @param tarGzFilePath {@link Path} to a gzipped tar file for extraction. | ||
*/ | ||
public static void extractTarGz(final Path tarGzFilePath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems useful if you could specify the destination directory in these methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I wanted to avoid the extra piping, but it's worth putting in now.
Fixed.
* @param overwriteExisting If {@code true} will overwrite an existing file in the requested location for the FIFO file. If {@code false} will throw an exception if the file exists. | ||
* @return The {@link File} object pointing to the created FIFO file. | ||
*/ | ||
public static File createFifoFile(final String fifoFilePathString, final boolean overwriteExisting) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably makes sense for this input to be either a file or a path, we can't create fifo's on exotic uri types anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled this directly from the code in StreamingProcessController
where this was sufficient.
I can update it to use Path
s.
If you guys want to give it a last look-over, I think it's good to go now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to @jonn-smith with some more comments. Found a problem with the validation method in the zip extractor test, plus a few other minor issues.
// Prepare our output directory: | ||
if ( destDir == null ) { | ||
final File tmpDir = createTempDir("IOUtilsUnitTest_testExtractTarGz"); | ||
tmpDir.deleteOnExit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createTempDir()
calls deleteRecursivelyOnExit()
for you, so the manual deleteOnExit()
calls are not needed (applies to the other test cases below as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but for whatever reason I found that it did not delete the directories at least once when I was testing. This seemed to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take it out and try again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
private void assertContentsTheSame(final Path baseActualPath, final Path baseExpectedPath) { | ||
try { | ||
// Check that the files and directories are the same: | ||
Files.find(baseActualPath, Integer.MAX_VALUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem: if baseActualPath
is missing one or more files that are present in baseExpectedPath
(or if baseActualPath
is completely empty), this method won't throw an error, since the code in the lambda will never get invoked for those missing files. You need either a counter for actual->expected matches, or a symmetric check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
|
||
// Get the base file path for the output: | ||
final Path baseActualOutputDataSourcesPath = outputDataSourcesPath.getParent().resolve( | ||
baseExpectedOutputDataSourcesPath.getFileName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't outputDataSourcesPath.getParent().resolve(baseExpectedOutputDataSourcesPath.getFileName())
the same as just outputDataSourcesPath
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - outputDataSourcesPath
is the path to the tar.gz
file whereas baseActualOutputDataSourcesPath
is the path to the extracted folder.
} | ||
|
||
@Test(expectedExceptions = UserException.class) | ||
public void testExtractTarGzThrowsException() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give this test case a more specific name like testExtractTarGzThrowsExceptionOnExistingOutput
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
// Create a FIFO file: | ||
final Path fifoFilePath = tmpDirPath.resolve(IOUtils.getPath("FIFOFILE")); | ||
IOUtils.createFifoFile( tmpDirPath.resolve(fifoFilePath) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fifoFilePath
was already resolved against tmpDirPath
at this point, wasn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. Fixed.
//================================================================================================================== | ||
// Private Static Members: | ||
|
||
private static final boolean doDebugTests = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doDebugTests
-> doFullScaleTests
(and add comment explaining why they're off by default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
// Create an output location for the data sources to go: | ||
final File tmpDir = createTempDir("FuncotatorDataSourceDownloaderIntegrationTest_testDownloadDummySmallDataSources"); | ||
tmpDir.deleteOnExit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createTempDir()
calls deleteRecursivelyOnExit()
for you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
arguments.addBooleanArgument(FuncotatorDataSourceDownloader.EXTRACT_AFTER_DOWNLOAD, doExtract); | ||
arguments.addArgument("verbosity", "INFO"); | ||
|
||
runCommandLine(arguments); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you assert something after running the command line (like the destination existing, etc.)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I can't assert that the contents are correct because we shouldn't check in the data sources to git lfs, but I'll add in an existence check for the downloaded file (though it is a little redundant when the integrity check is on).
Fixed.
The Travis tests are failing because the FIFOFILE that I created in the expected output dir for the tar.gz extraction tests does not seem to exist in git lfs. @droazen @lbergelson Do either of you know if this is a limitation or if there is some way to fix it? I can make a system call to extract the files for the expected results, but that's gross. |
Fifo files do not seem to be handled by git lfs.
Created NioFileCopier that copies files using nio paths (includes
optional progress indicator and integrity validation).
Updated an error message in funcotator to make it more descriptive.
Fixes #4549