-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design: client library API #257
Comments
Just below the posix wrappers, things largely funnel into the unifycr_fid_* calls. That's a leftover artifact from the original CRUISE code, and that interface is still posix-like, but it should provide a better starting point than the wrappers themselves. |
The unifycr_fid abstraction assumes that a file is a linear array of bytes, where you have to explicitly allocate (extend) and free (shrink) storage capacity at the end of the array. That made a lot of sense in CRUISE, but it makes less sense in the case of unifycr. |
Should we start using Project boards to organize these kind of efforts? |
UnifyFS-Client-API-Proposal-v02.pdf See attached pdf for current proposed API. |
Here's the markdown writeup. |
So I've finally had some time to fully review the API doc and the comments in this issue. I'll first go over the comments and then I'll go over what I see are the most important parts of the API proposal.
That's a good thing. We should be happy when there's no custom, monolitic, client library API that our users have to learn and link against (and that we have to document and maintain). I'll be happy when unifyfs_mount/unmount are gone. Imagine being able to use any binary with UnifyFS without having to build against Unify or do anything. That would be amazing.
In general, we should only provide an API for things that POSIX doesn't support, or that we can't tack on to POSIX in some way, but only if there is a real user need for it. If it's a nebulous "the user may want this at some point in the future, possibly..." then we should wait. Otherwise we risk wasting time developing a feature nobody uses, or that's designed wrong (but we still have to document and maintain). For example, we could do everything listed in #148 using POSIX:
I can't speak to the VeloC memory-based API or #248 since I'm unfamiliar with them. So why is defaulting to POSIX a good thing?:
Regarding the doc itself: https://github.com/LLNL/UnifyFS/blob/client-api-doc/docs/client-api.md After reading the doc I'm still not clear on what the actual requirements are for this API. I was hoping there would be a section where it would list the requirements from users. As in, "the HDF5 folks want the following features: A B C... and this is why they want them". There are reasons given in the "Motivations" section, but I'm still skeptical:
This proposed API increases modularity in the same way this increases modularity: int UNIFYFS_WRAP(printf)(const char *format, ...) {
va_list args;
va_start (args, format);
unifyfs_printf(format, args);
va_end (args);
}
int unifyfs_printf(const char *format, ...) {
va_list args;
va_start (args, format);
vprintf (format, args);
va_end (args);
} There, printf() is now more "modular". Has this improved anything? No. It seems to me that the API is similar. For the most part it just provides another layer of indirection with functions that are largely just variants of the POSIX ones (with some exceptions). I don't see a benefit from this. It's not like SCR (https://github.com/llnl/scr), where there were discrete, self-contained parts of the code that could be spun off into separate modules (and it was beneficial to do so). I don't see how the proposed API would help permit implementation of "new storage backend technolgies" anymore than the current codebase.
I agree, you can't implement a FUSE driver using the APIs we have now. For example, I don't think our opendir/readdir currently work. No doubt there's other functions we'd need to implement to. But the answer to this is to implement the missing functions, not design a totally new API from scratch. In fact, the proposed API would make it harder to implement a FUSE driver than if we were to implement the missing POSIX functions. Why? Take a look at the FUSE functions: http://www.maastaar.net/fuse/linux/filesystem/c/2016/05/21/writing-a-simple-filesystem-using-fuse/ They're basically just analogs of the POSIX functions.
If you're worried about it imparting new meaning to Also, the doc makes the point that a user would have to add #include <sys/stat.h>
chmod(); to: #include "unifyfs_api.h"
unifyfs_handle fshdl;
unifyfs_initialize();
unifyfs_laminate();
unifyfs_finalize();;
Makefile changes to add -lunifyfs_api -I/path/to/unifyfs/headers
95% of Unify's core functionality is already exposed though our POSIX wrappers. We should not duplicate that functionality in a custom API. For the 5% that is not, we should (and I'm repeating myself here):
For example, the doc proposes an API for file transfers. Why not consider wrapping
I assume this is referring to the proposed Lastly, I wanted to talk about this diagram:
I know this is the dream, but I feel it will quickly turn into this:
Why? Because the top diagram is basically saying our internal API is libunifyfs and that's going to be a stable API. Internal APIs are never stable. They change all the time. Let give an example. Currently we have an internal API function called unifyfs_rc unifyfs_create(unifyfs_handle fshdl,
const int flags,
const char* filepath,
unifyfs_gfid* gfid); ...what if we needed to make I think this is a more likely diagram to shoot for:
So what would I include in an UnifyFS API?
The API would expose these as nice, easy to use, stable functions to the user, and then call iocts() or internal functions under the covers to actually make things work. |
We can wait for later versions, but we'll want to include calls that can be used to operate on many files at once. All basic posix calls require the user to operate on one file at a time. HPC easily generates datasets that have millions of files, so one-at-time is too slow. We want to have calls where a single command can be broadcasted down a tree to all servers, which can then operate in parallel.
For a directory containing 6 million files as we had with jobs from Sequoia, this takes a long time. It'd be nice to have methods so this can be parallelized, in the same way that one can parallelize reading a file by stat'ing it with rank 0, bcasting the size to all ranks, and then having each rank lseek into a segment and start reading. A similar interface could exist for reading items from a directory. We'd want a function to return the total number of items, say statdir() and then another function that lets one seek into the middle of the set, say lseekdir(offset). For example:
And this could be combined with a range read to grab a whole collection of items at once:
|
@tonyhutter I will be brief in my answer. Your concerns are noted, but you are focusing on the wrong use case. We have two main classes of users - parallel applications and I/O middleware libraries. For existing applications, I am in complete agreement that we should be able to do 99% of what we want to do in the POSIX, MPI-IO, etc. calls they are already using in their application. Currently, the primary use case for this client API is embedding in other libraries, like HDF5. We have had several conversations with the HDF5 team about different non-POSIX behaviors we could offer them as useful capabilities. There is no reason why we should not provide them with a straightforward API for using those capabilities. |
@MichaelBrim then you need to list exactly what HDF5's requirements are. What did HDF5 say they wanted in those conversations? HDF5 is only mentioned once in the Motivation section, and even then the requirements are vague:
I brought up this lack of detail three months ago:
Without knowing what HDF5's requirements are, how can we possibly know if this is this API is the best way to satisfy their requirements? For example, I see you propose a /* Remove an existing file from UnifyFS */
unifyfs_rc unifyfs_remove(unifyfs_handle fshdl,
const char* filepath); I have no idea if HDF5 needs that or not. There's no "HDF5 asked for a way to remove files without using unlink() because of reason X, so here's what I propose" listed anywhere in the doc. How am I to know if this function is something that's really needed, or just re-inventing the wheel? The design of APIs should be driven by requirements, and we all need to know what those specific requirements are. After we get the requirements we can then decide what is reasonable to implement and what that implementation would look like. |
I don't know if everyone on this issue was also on the "Unify/HDF5 discussion on MPI-IO" so i'll briefly repeat myself. Here are three things I would like to see in a libunify API that you cannot get from wrapping posix open/write/read/close calls.
With these items in place, it is still possible to provide legacy posix interfaces, including semantics. |
Oh, a fourth thing! POSIX asynchronous i/o is awful. An HPC-oriented async i/o interface would look a lot different and perform a lot better (as we demonstrated with PVFS). |
I concur with Rob's points - it's worthwhile to work on different aspects of the visibility-asynchrony-performance "iron triangle" and think what you are willing to ask for an give up. Today, it seems like giving up some visibility in favor of performance (by using more asynchrony) is a good choice. With that in mind - I would suggest making all your API routines asynchronous, not just read/write/truncate/zero. Having asynchronous open/close/etc operations as well is quite useful. |
Currently, all client library functionality is designed around direct interception of existing I/O APIs (either via linker wrapping or GOTCHA). As a result, there really isn't a defined client library API other than
unifycr_mount
andunifycr_unmount
. This leads to quite a bit of redundant code, and doesn't adequately support alternate uses, such asThis issue is to track the code reorganization needed within the client library to cleanly abstract the UnifyCR functionality from the upper-level uses. Ideally, we would end up with a
libunifycr
that could be used directly by an application, or as a support library for bylibunifycr-posix
(and perhapslibunifycr-mem
). My suggestion is that thelibunifycr
API needs a thoughtful design to support the various use cases, and should not be tied to POSIX APIs/semantics. Thelibunifycr
implementation would contain all of the code to interact with the localunifycrd
(i.e., RPCs and shmem communication, shared data types and any associated serialization).The text was updated successfully, but these errors were encountered: