-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataSource API] Design issues and/or missing documentation #989
Comments
References
|
Goals and use cases 1. Enable reading/writing seamlessly from/to a compressed or uncompressed file Practically, we want to be able to run either :
and also
2. Enable reading/writing files seamlessly from/to a directory, a zip file, an in-memory map, or another storage (AFS node, tarball ...) In practice, we would like to be able to write something like : Path path = Paths.get("/path/to/network.xiidm");
Network network = Importers.loadNetwork(DataSources.file(path)); and also Map<String, byte[]> data = ...;
String name = "network.xiidm";
Network network = Importers.loadNetwork(DataSources.map(data, name)); A question for the command line : if a zip file is given as argument, since we don't have any hint for the file names inside the zip, how should we resolve them ?
Should we assume that the zip will contain files named "network.<extension>", or should it work if the zip contains one file named "toto.<extension>" ? And in this case, throw if we don't know what files to choose ? 3. Enable reading/writing Some network file formats may rely on multiple files :
Do we want to be able to define the source for the model through a single name ? |
Propositions 1. Simplify implementation of interface DataSource {
List<String> getFileNames() throws IOException;
InputStream newInputStream(String name) throws IOException;
OutputStream newOutputStream(String name, boolean append) throws IOException;
} Other methods look common to all implementation. 2. Provide more utility methods to create data sources The framework could provide methods to build and manipulate data sources more easily, maybe chaining calls: DataSource source = DataSources.singleFile("/path/to/network.xiidm");
Map<String, byte[]> map = new HashMap<>();
DataSource target = DataSources.mem(map)
.gzip()
.observe(observer);
// Copies all the content
DataSources.copy(source, target); |
I would keep the actual DataSource, ReadOnlyDataSource separation as some use case requires only read access and forbid write access (like importer). Also, some implementations only make sense for read like ResourceDataSource which give access to classpath data for unit testing. |
Not clear to me how it would work which such a design: DataSource source = DataSources.singleFile("/path/to/network.xiidm"); In an importer implementation, it would require the importer to have an extra information to known which file name to import: // in an importer code
dataSource.newInputStream(????); What dataSource.getFileNames() is supposed to return in that case? just network.xiidm ? network.xiidm and all its friends of the same directory? |
/**
* A DataStore represents access to many unrelated blobs, not limited to a single network.
* examples: on local filesystem all files in a folder, or everything in a zip, everything in a database..
*/
public interface ReadOnlyDataStore {
Set<String> getStreamNames() throws IOException; // list all the things !
InputStream newInputStream(String name) throws IOException; // open them one by one
// note: no basename, no network name
}
// Importers can use DataStore to abstract away access to blobs in different storages and autodiscover networks.
// They are created from some kind of uri which can be used to filter what is inspected during the autodiscover phase /**
* A NetworkDataStore represents access to all the data of one network only.
*/
public interface NetworkDataStore extends DataStore {
String getBaseName();
Set<String> listStreams() throws IOException; //only the streams of one network
InputStream newInputStream(String name) throws IOException;
} Then you can have a function For writes, since we can create any new file, a simple public interface DataStore extends ReadonlyDataStore {
OutputStream newOutputStream(String name) throws IOException;
} might suffice ? |
Design issues, or missing documentation.
The
DataSource
API, which is used in particular in the coreImporter
API, is very hard for third-party users to use and understand.An original issue seems to be that it is used for 2 different things:
Network
as a set of binary contents which can be read by anImporter
. To help for this, it has the concept of "base name" which has nothing to do with item 1. However, the actual resource names are not known by the data source, only by the importer which has the ability to import it.A consequence of this is that the manipulation of
DataSource
objects as described in item 2 is not convenient, because it's not autonomous.For example, if you want to simply copy the binary content representing a network from a DataSource to an in-memory
Map<String, byte[]>
, you will need :Importer
which knows what files in the DataSource are part of the network representation. If you want to find an adequateImporter
, you can useImporters.findImporter
but you need to provide unexpected parameters, for example aComputationManager
MemDataSource
) : otherwise, at the end you don't know what files have been actually copied, only theImporter
knows.Or for example you could want to copy the contents to an in-memory zip. Once again, you need to provide your own DataSource implementation, and this is not obvious since the interface is not documented.
The
DataSource
API is used a lot in the framework, and should be documented, and simple to understand and use for third-party users.Possible guidelines for a clearer API would be to separate the 2 concerns described above into 2 different objects or interfaces. One would be in charge of abstracting away the storage, while the other one would carry the actual set of binary contents which serve for the
Network
representation.This central part of the API really feels like a blocker for adoption of the framework by third-party users.
The text was updated successfully, but these errors were encountered: