Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI and Path specifier classes for input/output resources. #34

Merged
merged 3 commits into from
Feb 13, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ subprojects {

dependencies {
testCompile "org.testng:testng:6.14.3"
testCompile "com.google.jimfs:jimfs:1.1"

compile 'commons-io:commons-io:2.5'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention: is this needed in this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - its left over from previous changes and can be removed.

}

test {
Expand Down
86 changes: 86 additions & 0 deletions core/src/main/java/org/htsjdk/core/api/PathURI.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
package org.htsjdk.core.api;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that a better place will be org.htsjdk.core.api.io, to keep things more organized.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import java.nio.file.Path;

/**
* Interface representing htsjdk-next input/output resources as URIs.
*/
public interface PathURI {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this will represent a resource, what's about IOResource? I'm not sure if it is better, but I don't like the PathURI name...

Copy link
Collaborator Author

@cmnbroad cmnbroad Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I don't love the names. IOResource might be an improvement - we'd still need a name for the default implementation class (currently called PathSpecifier for lack of a better name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to IOResource; still not sure about PathSpecifier.


/**
* @return true if this URI has a scheme that has an installed NIO file system provider. This does not
* guarantee the URI can be converted into a {@code java.nio.file.Path}, since the URI can be syntactically
* valid, and specify a valid file system provider, but still fail to be semantically meaningful.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to have properly formatted javadocs. Most of the information should be at the begining, and the @return tag should contain just the information about the return value (in this case, {@code true} if the URI has a installed NIO filesystem provider; {@code false} otherwise).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

*/
boolean isNIO();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to the new disq, I don't like this isNio name in the method. I prefer something like hasFileSystem or throw a meaningful exception if a Path could not be retrieved to be catched by client code...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clients can ignore this and do exactly as you describe (just call toPath, which will throw).


/**
* Return true if this {code PathURI} can be resolved to an NIO Path. If true, {@code #toPath()} can be
* safely called.
*
* There are cases where a valid URI with a valid scheme backed by an installed NIO File System
* still can't be turned into a {@code java.nio.file.Path}, i.e., the following specifies an invalid
* authority "namenode":
*
* hdfs://namenode/to/file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to include specific file-system in the docs, I would recommend to add a @see tag to point to HDFS to understand better the documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The details of HDFS aren't relevant - its just an example URI. Changed to "file://".

*
* The current implementation returns false for these cases (toPath will fail, getInvalidPathReason
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no current implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* returns the reason code).
*/
boolean isPath();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need isNio and isPath? I know that they are slightly different (if they are valid or not), but I prefer to just throw detailed exceptions if a Path is retrieved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callers can still fo the path conversion themselves if desired.


/**
* Get a {@code java.net.URI} object for this {@code PathURI}. Will not be null.
* @return The {@code URI} object for this PathURI.
*/
URI getURI();

/**
* Returns the string from which this {@code PathURI} was originally created. This string may differ from
* the normalized string returned from a Path that has been object resolved from this PathURI.
* @return string from which this URI as originally created. Will not be null, and will always
* include a URI scheme.
*/
String getURIString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a sugar-syntax for getURI().toString(), it should be specified in the javadoc. In addition, a default implementation can be also provided here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default implementation added.


/**
* Return the raw input string provided to the constructor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that the constructor will be always a String, which might not be the case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True; this interface was factored out of the implementation, so the constructor reference isn't relevant. Updated.

*/
String getRawInputString();

/**
* Resolve this PathURI to an NIO Path. Can be safely called only if {@code #isPath()} returns true.
*/
Path toPath();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained before, I think that it is better to remove that many isNio/isPath and throw custom exceptions that can be catch by client code.


/**
* Return a string message describing why this URI cannot be converted to a {@code java.nio.file.Path}
* ({@code #isPath()} returns false).
* @return Message explaining toPath failure reason, since it can fail for various reasons.
*/
String getToPathFailureReason();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of removing the isPath, this could just be removed and the message will be in the exception thrown. I suggest this changes because it will simplify a lot the interface, which for my point of view is one of the things that I would like to have in htsjdk3. If the public API is the minimal required, maintainance will be much easier and less error prone for clients implementing the classes and integrating it into the framework.


/**
* Return the scheme for this PathURI. For file URIs (URIs that have no explicit scheme), this will return
* the scheme "file".
* @return the scheme String or this URI, if any. May be null.
*/
default String getScheme() {
return getURI().getScheme();
}

/**
* Get a {@code InputStream} for this URI if a provider is for the URI's scheme is available.
* @return {@code InputStream} for this URI.
*/
InputStream getInputStream();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, one of the aims of #5 was to get rid of the InptStream/OutputStream for a common ByteSource. I prefer that option and in that case IOResource should return that class. But that means that this is blocked by #5


/**
* Get an {@code OutputStream} for this URI if a provider is for the URI's scheme.
* @return {@code OutputStream} for this URI.
*/
OutputStream getOutputStream();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before.

}
21 changes: 21 additions & 0 deletions core/src/main/java/org/htsjdk/core/utils/IOUtils.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
package org.htsjdk.core.utils;

import java.io.File;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No File in htsjdk3. This class should be removed and the only method usages modified in favor of Files.createTempFile

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - done.

import java.io.IOException;

public class IOUtils {

/**
* Create a temporary file using a given name prefix and name suffix.
* @param prefix
* @param suffix
* @return temp File that will be deleted on exit
* @throws IOException
*/
public static File createTempFile(final String prefix, final String suffix) throws IOException {
final File tempFile = File.createTempFile("testOutputStream", ".txt");
tempFile.deleteOnExit();
return tempFile;
}

}
238 changes: 238 additions & 0 deletions core/src/main/java/org/htsjdk/core/utils/PathSpecifier.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
package org.htsjdk.core.utils;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either org.htsjdk.core.utils.io or in the same package as the PathURI (or the renamed version).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


import org.htsjdk.core.api.PathURI;
import org.htsjdk.core.exception.HtsjdkException;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.Serializable;
import java.net.URI;
import java.net.URISyntaxException;
import java.nio.file.*;
import java.nio.file.spi.FileSystemProvider;

/**
* Default implementation for PathURI.
*
* This class takes a raw string that is to be interpreted as a path specifier, and converts it internally to a URI or
* Path object. If no scheme is provided as part of the raw string used in the constructor(s), the URI is assumed
* to represent a file on the local file system, and will be backed by a URI with a "file:/" scheme and a path
* part that is automatically encoded/escaped to ensure it is a valid URI. If the raw string contains a scheme,
* it will be backed by a URI formed from the raw string as presented, with no further encoding/escaping.
*
* For example, a URI that contains a scheme, and has an embedded "#" in the path will be treated as having a
* fragment delimiter. If the URI contains an scheme, the "#" will be escaped.
*
* There are 3 succeeding levels of validation:
*
* 1) PathSpecifier constructor - requires a syntactically valid URI, possibly containing a scheme (if no scheme
* is present the path part will be escaped/encoded)
* 2) isNio - returns true if the URI is syntactically valid, and there is an installed NIO provider that matches
* the URI scheme
* 3) isPath - syntactically valid URI that can be resolved to a java.io.Path by the provider
*
* <scheme>:<scheme-specific-part>
* <scheme>://<authority><path>?<query>
* absoluteURI = scheme ":" ( hier_part | opaque_part )
* hier_part = ( net_path | abs_path ) [ "?" query ]
* net_path = "//" authority [ abs_path ]
* abs_path = "/" path_segments
*
* A URI is absolute if, and only if, it has a scheme component.
* A URI is opaque if, and only if, it is absolute (has a scheme) and its
* scheme-specific part does not begin with a slash character ('/')
*
* A relative reference that does not begin with a scheme name or a
* slash character is termed a relative-path reference.
*
* A relative reference beginning with a single slash character is
* termed an absolute-path reference, as defined by <abs_path> in
* Section 3.
*
* URI that do not make use of the slash "/" character for separating
* hierarchical components are considered opaque by the generic URI
* parser.
*/
public class PathSpecifier implements PathURI, Serializable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should make our defaults implementations final, unless required. The same as before, we should not allow to extend classes unless we explicitly see a widely spread use-case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should Serializable be part of the PathURI API?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question on Serializable. As for final, the intent is to provide a default implementation that allows overrides with custom resolution. GATK will certainly need to do this for gcs, etc.

private static final long serialVersionUID = 1L;

private final String rawInputString; // raw input string provided by th user; may or may not have a scheme

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final String rawInputString; // raw input string provided by th user; may or may not have a scheme
private final String rawInputString; // raw input string provided by the user; may or may not have a scheme

private final URI uri; // working URI; always has a scheme (assume "file" if not provided)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be transient too, as the Path? I guess that the rawInputString is the only needed field to re-generate the class.

Copy link
Collaborator Author

@cmnbroad cmnbroad Nov 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class maintains the invariant that there is always a backing URI, whereas the cached path is just an optimization. So the cache is truly transient.

private String pathFailureReason; // cache the reason for "toPath" conversion failure

private transient Path cachedPath; // cache the Path associated with this URI if its "Path-able"

public PathSpecifier(final String rawInputString) {
ParamUtils.nonNull(rawInputString);
this.rawInputString = rawInputString;

URI tempURI;
try {
// If the input URI already has a scheme (including a "file" scheme), we assume its already properly
// escaped. If no scheme component is present, then assume its a raw path on the local file system, so
// try to get a Path first, and then recreate the URI by retrieving it from the resulting Path. This
// ensures that input strings that contain embedded characters that would otherwise be interpreted as
// URI syntax (such as embedded fragment specifiers ("#") that are valid in file names) are properly
// escaped.
tempURI = new URI(rawInputString);
if (!tempURI.isAbsolute()) {
// NOTE: This case (no scheme) is the only one where we resolve the URI to a Path at construction time.
try {
setCachedPath(Paths.get(rawInputString));
tempURI = getCachedPath().toUri();
} catch (InvalidPathException e) {
throw new IllegalArgumentException(e.getMessage(), e);
}
}
uri = tempURI;
if (!uri.isAbsolute()) {
// assert the invariant that every URI we create has a scheme, even if the raw input string does not
throw new HtsjdkException("URI has no scheme");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have either more specific exceptions (never throw a raw one) or in this case it should be an IllegalArgumentException as it is a problem with the provided param.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this one to IllegalArgumentException. All of the others are HtsjdkIOException (subclass of HtsjdkException).

If HtsjdkException is truly intended to only be a base class, it should probably have protected constructors.

}
} catch (URISyntaxException e) {
final String errorMessage = String.format("%s must be a valid URI. '%s'/'%s'", rawInputString, e.getMessage(), e.getReason());
throw new IllegalArgumentException(errorMessage);
}
}

@Override
public boolean isNIO() {
// try to find a provider; assume that our URI always has a scheme
for (FileSystemProvider provider: FileSystemProvider.installedProviders()) {
if (provider.getScheme().equalsIgnoreCase(uri.getScheme())) {
return true;
}
}
return false;
}

@Override
public boolean isPath() {
try {
return getCachedPath() != null || toPath() != null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only toPath() != null is required.

} catch (ProviderNotFoundException |
FileSystemNotFoundException |
IllegalArgumentException |
HtsjdkException |
AssertionError e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not nice to have a catch clause just because of a test. The test should handle that instead...or this catch any error when trying to get the path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a test assertion, its actually thrown by at least one of the provider implementations (jimfs, and also some java code IIRC).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a test assertion, its actually thrown by at least one of the provider implementations (jimfs, and also some java code IIRC). It is unfortunate though.

// jimfs throws an AssertionError that wraps a URISyntaxException when trying to create path where
// the scheme-specific part is missing or incorrect
pathFailureReason = e.getMessage();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the message is empty or uninformative, the failure reason is not really useful; sometimes the useful message comes from the stacktrace. I would say to at least test if the message is empty, and if it is at least specify the class for the exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think that it will be much better to store the exception to let the user handle it by themselves as they wish.

return false;
}
}

@Override
public URI getURI() {
return uri;
}

@Override
public String getURIString() {
return getURI().toString();
}

/**
* Return the raw input string as provided to the constructor.
*/
@Override
public String getRawInputString() { return rawInputString; }

/**
* Resolve the URI to a {@link Path} object.
*
* @return the resulting {@code Path}
* @throws HtsjdkException if an I/O error occurs when creating the file system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not really throw HtsjdkException when creating the filesystem.

*/
@Override
public Path toPath() {
if (getCachedPath() != null) {
return getCachedPath();
} else {
final Path tmpPath = Paths.get(getURI());
setCachedPath(tmpPath);
return tmpPath;
}
}

@Override
public String getToPathFailureReason() {
if (pathFailureReason == null) {
try {
toPath();
return String.format("'%s' appears to be a valid Path", rawInputString);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should specify the behavior at the interface-level for this case. I think that it is much better if it returns an empty String.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of no-failure, this could instead return an Optional too; in the case of returning the exception instance, that will be even better as we should not return an exception for a non-error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good idea. Changed to Optional.

} catch (ProviderNotFoundException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be summarize better as:

Suggested change
} catch (ProviderNotFoundException e) {
} catch (final Exception e) {
return String.format("%s: %s", e.getClass().getSimpleName(), e.getMessage());
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only in case the exception is not returned instead, which looks much better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally try to avoid catching Exception, especially without re-throwing it, since it casts too wide of a net and can mask underlying problems.

return String.format("ProviderNotFoundException: %s", e.getMessage());
} catch (FileSystemNotFoundException e) {
return String.format("FileSystemNotFoundException: %s", e.getMessage());
} catch (IllegalArgumentException e) {
return String.format("IllegalArgumentException: %s", e.getMessage());
} catch (HtsjdkException e) {
return String.format("UserException: %s", e.getMessage());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to say HtsjdkException here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}
return pathFailureReason;
}

@Override
public InputStream getInputStream() {
if (!isPath()) {
throw new HtsjdkException(getToPathFailureReason());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No raw HtsjdkException

}

final Path resourcePath = toPath();
try {
return Files.newInputStream(resourcePath);
} catch (IOException e) {
throw new HtsjdkException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No raw HtsjdkException

String.format("Could not create open input stream for %s (as URI %s)", getRawInputString(), getURIString()), e);
}
}

@Override
public OutputStream getOutputStream() {
if (!isPath()) {
throw new HtsjdkException(getToPathFailureReason());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No raw HtsjdkException

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - perhaps we need a separate discussion about how we want to partition the exception namespace. #36.

}

final Path resourcePath = toPath();
try {
return Files.newOutputStream(resourcePath);
} catch (IOException e) {
throw new HtsjdkException(String.format("Could not open output stream for %s (as URI %s)", getRawInputString(), getURIString()), e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No raw HtsjdkException

}
}

// get the cached path associated with this URI if its already been created
protected Path getCachedPath() { return cachedPath; }

protected void setCachedPath(Path path) {
this.cachedPath = path;
}

@Override
public String toString() {
return rawInputString;
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof PathSpecifier)) return false;

PathSpecifier that = (PathSpecifier) o;

if (!getURIString().equals(that.getURIString())) return false;
if (!getURI().equals(that.getURI())) return false;
return true;
}

@Override
public int hashCode() {
int result = getURIString().hashCode();
result = 31 * result + getURI().hashCode();
return result;
}

}
Loading