-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved API suggestion #3
Comments
I think the main problem is that that
I think the
Regarding using a class, it could be
The number of lines are about the same and I think the function case is more familiar to the typical user. I actually think the typical scientist code that uses HAPI will look like like about 3-5 (based on data used in a paper) of these blocks:
Or,
Most likely the above would be iterated over a set of 5-10 start/stop times. I don't expect much use of I also don't expect a user to be using the
@jvandegriff and @jbfaden - any thoughts? |
I don't see how this approach solves "the main problem is that hapi function call signature is too overloaded." which I think we both agree on. This just moves the overload problem from The other idea would be to provide both approaches and let users decide how they want to interact with. Your usage case suggests an advanced user that knows exactly what they want and just gets it and moves on. This approach is not very friendly to an exploratory user who is trying to find what is available. On the usefulness of if h.status():
data = hapi.data(server, dataset, parameters, start, stop)
else:
data = None instead of try:
data = hapi.data(server, dataset, parameters, start, stop)
except SomeError:
data = None |
Hi @rweigel @jbfaden , I also wonder if it makes sense to return dict with stuff like status, instead of Optional[object] with members containing expected data, current API looks like it mixes abstraction levels. Last point, I think that hapi-client should just provide data access and any plot/display features should be moved to another package. |
There is going to be a major refactoring soon and we will revisit this. The issue of the documentation and code completion has bothered me too. I wrote this when I was just learning Python after finishing the MATLAB version. In MATLAB, variable input-output is the defacto convention and it works well. I was also trying to mimic the REST interface. Somehow this does not translate to Python. At the very least, there should be four different functions such as In the second paragraph, I think that you are suggesting that the interface is more like the I agree that it is time to move the plotting functionality into another package. It was there for testing the client during development but has grown too much. |
I'd suggest using existing names for functions, like hapiclient.getCatalog(server), hapiclient.getInfo(server,id),hapiclient.getData(...). |
I was about to propose some kind of POC so we can discuss around some code and see how hapi-client could evolve. |
POC started here |
I think the POC is a good starting point for discussion. I probably won't have time for it until mid-December, but will discuss it when I meet with one of the developers who is working on the parallel requests branch and see what his thoughts are. |
Here is a basic notebook showing how to use the POC API Few random thoughts after writing this code:
That's all for today! :) |
A few quick points: Regarding 2. , the client already has an option for caching data and metadata. There has been some discussion on a specification for the data cache. We are planning to modify the client so it uses this specification. We almost have the parallel request code completed. It turns out to be a bit more complicated than I expected. Regarding 7., there is another group working on a data model for Python and heliophysics data. My plan was to keep the plain NumPy array and then allow users to use the library that they produce to map to a different data model. So my plan was to keep the client data model as simple as possible. |
About the cache I'm not sure to see benefits of a specification other than sharing data across client implementations. Since it should provide good performances and be robust, this might imply totally different implementations depending the client. After a quick look at the wiki page:
Writing such code might get tricky since it exposes many corner cases especially with multiprocessing(concurrent access on files is always a mess). I would really advise to use diskcahe it's insanely fast and thread/process safe, plus it has a really user friendly UX. |
Following up on our discussion during the Heliopython meeting, I thought I would provide my suggestions here. The purpose of these suggestions is also to make it easier to document the input arguments.
First, the client object should be instantiable regardless of whether the server is available therefore
This should return an instance ready to connect to the
url
. It would then be worhwhile to provide a test of server availability such asUsing this API would match that used by the Python socket module. It would return
False
if it does not get a response from the server so that something like the following would be possibleOnce the connection is established, information could be asked for on the dataset available from the server with
Next a user would likely want to connect to a particular dataset and then ask for data. This could be done as following
If a user wanted to get the data from all datasets then they could easily run
It may also be worthwhile to provide the optional argument with instantiating to enable the following
The text was updated successfully, but these errors were encountered: