-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Human readable names for queries #13
Comments
I also like this idea - for example, for experiment factory experiments we have our experiment "tag" as unique ID, and it is very intuitive and easy to find what you are looking for. It would be easy to have some kind of version in the name, in the case of multiple of the same query with different versions. |
I'm not sure it matters since it is just the filename, as soon as you open the file there all the human readable titles and descriptions. I kinda like the consistency of using uuids and pushing all the readable information into the file metadata. It seems redundant to have that info duplicated. |
Let's say you are a developer and you want to update your query. You go to the repo and there are 100 of them... |
Exactly it makes reading code harder. On Mon, Jan 11, 2016 at 10:36 AM, Vanessa Sochat notifications@github.com
|
personally, after there are 100 of them I wouldn't remember the file name
On Mon, Jan 11, 2016 at 10:37 AM, Chris Filo Gorgolewski <
|
What is easier to understand:
or
|
I am in agreement with @chrisfilo. It is a detail that will make development much easier. By way of the files needing to exist in the same folder, that alone ensures uniqueness in naming. |
well I don't understand either without looking at the query, but I see what Initially, I had a metadata file that indexes all the queries called, On Mon, Jan 11, 2016 at 10:48 AM, Vanessa Sochat notifications@github.com
|
Yes we should keep the metadata, but using human readable names will make developers life easier. |
sure, go for it. can you two decide on a recommended style? for example On Mon, Jan 11, 2016 at 11:17 AM, Chris Filo Gorgolewski <
|
I would say use underscores, because you can't use hyphens in python function names.
|
this sounds just like our terms discussions. readable names simply don't scale. i would really like to see us build tools that query the metadata quickly or provide interactive interfaces for editing queries. perhaps we are not at the point yet, but there is a reason why issues on github, questions on stack overflow and google docs all don't have readable names. (stack overflow uses a slug for readability, but the id is what makes things unique) so instead of punting the interaction between scalable and readability, i think we should put in the effort during the upcoming sprint to have tools that allow us to address this (independent of what the query url looks like). for example, web service/api/command line tool for querying queries. |
+1 On Mon, Jan 11, 2016 at 11:29 AM, Satrajit Ghosh notifications@github.com
|
Issues on github, questions on stack overflow, and google docs are all examples of instances. Indeed that's where numeric identifies make sense. However we are talking about queries which are considered methods. Those should have human readable names and all of the examples you gave opt for such solution. For example a path for editing a comment on stackoverflow is:
it is NOT
where |
isn't a query-id simply an instance of a query? if so, all i'm suggesting is that we provide something like: nidm.nidash.org/query/query-id/edit as a web service, or something equivalent for other things i don't think we are talking of queries as methods here ( i can see how it can be seen as such - but i don't think of it that way). anyone can create a query and we will have a collection of queries that an api/web service can call, but they are still instances (they have versions, they will apply to certain versions of the model, they will only work on certain versions of data, etc.,.). |
The nidm-api by default serves a REST API, and the current format to view a query is:
and this generates: The issue still comes up about how the developer finds the query_id. To have to do that extra step every time, and to have to provide more methods to look up / search with the API does not make sense when we can just use strings with underscores that a human can remember. There are two use cases right now for the API. Either someone uses the REST API and must make a call like the above to retrieve the query and do something with it, or the developer uses our python tool to do the query. The second looks like this: First we retrieve all queries in a dictionary, with lookup key the unique id
Then we would need to just know the qid. This adds an extra annoying step to figuring out the qid every single time.
The result is a pandas data frame. I would even suggest we simplify the above further to be more like what @chrisfilo suggested:
In the eyes of the developer, the query is a method. It is run to retrieve a particular result object. The purpose of the nidm-api, period, is to extend NIDM to developers. This means making using it as easily as possible for them. Insisting on a long string of letters and numbers only with the justification that it scales better is not logical, and in fact it makes life a lot harder for the exact audience we are intending for this tool. It also makes it harder for the people writing the query objects. If I go to the github repo now to find the "get_peak_coordinates" query - where is it? It's not intuitive. Scalability might be an issue if these things are made en-masse in an automated way, but they aren't. We are going to have a limited set because they are made by humans. This means they can give them a name that makes sense. I do not see any benefit in having such cryptic names when the entire purpose is to make this more user friendly. |
given the nidm-api, not just around nidm-results, the set of possible queries one can make is immense, especially as we allow people to fork/modify queries (by whatever interface - not necessarily a script).
in any scenario where the number of queries exceed a handful known ones, a developer will have to look into the metadata of a query to find the query-id or run a query to find a query-id using some matching criteria. i completely agree that if the goal of nidm-api is to only expose a finite set of specific queries, those should simply be methods of the API, but if the goal is to to run a generic method as if a developer has to use a query the developer needs to understand the nuances of the query, and no amount of human readable name is going to help the developer. that is why i predicated my previous post saying, independent of how the query-id looks we really need to have tools to search through the set of queries and for forking/editing said queries. i'm completely for the api being easy for developers. what i'm speaking against is the notion that naming a few queries to be human readable is the solution to the problem. |
Isn't that what variables are for? The queries can have specific variables.
The datatype returned is not integrated into the query, the user selects datatype to be returned as a variable of the do_query function in the nidm-api. The API always will retrieve the output of the query in some format, and parse to what the user wants.
I disagree. If I am a developer all I need is to know the data that I want to retrieve from the input file (such as turtle nidm-results) and the arguments that I can give.
I think that is why we have them on github - to implement our own version of forking / editing seems like re-inventing the wheel. I agree a search function added to the nidm-api to search through the query data structures would be neat.
I don't think I am suggesting it is a "solution," but it's making it just a little bit harder for people who just want to query some nidm-object to retrieve the data they need. |
If you write a CONSTRUCT query a graph is returned, ASK returns a
This is true for an API endpoint, but queries feel a bit more malleable ... Its kind interesting, should a query really a be thought of as a
right, github could be the backend but what about a frontend for forking
i guess its a tradeoff, I would suspect anyone who 'just want to query |
Very interesting discussion!
This relates to one point that is not entirely clear for me right now: how do we handle variants of the same query within nidm-query? For example, the get_peak_coordinates query has already existed in several "flavours", e.g. also returning optional peak fwer, also returning statistic type, also returning contrast name... To be extreme, we could even go all the way to the top of the tree and include the type of HRF that was used... The question is where do we stop and how do we decide which of those variants is the one we want in nidm-query? Or do we want all of them?
@vsoch: this could be part the solution but I am not clear how specific variables could be defined for a given query. Could you give me more details or, even better a small example, of what you had in mind? |
I was wondering if we could switch to a more user friendly naming scheme for queries. Currently long uninformative hashes are used witch makes parsing code (that uses for example nidm-api) difficult for humans.
We can alternatively wait until singularity ;)
The text was updated successfully, but these errors were encountered: