Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dataquery] Define new swagger schema for data query API #8219

Merged
merged 6 commits into from
Dec 19, 2022

Conversation

driusan
Copy link
Collaborator

@driusan driusan commented Nov 7, 2022

This defines the schema I've been using to design a QueryEngine (rather than CouchDB) based data query tool.

The meaning of the 3 types of queries in the /queries endpoints are:

  1. recent: There are no more "saved queries". Instead, all recent queries are "saved" and can be reloaded. Users can star or name a query in order to indicate that they're important queries that shouldn't be purged and duplicate the functionality of the old saved queries. (Currently, no queries are ever purged, but if we do change that in the future, starred ones are the equivalent of saved ones.)
  2. shared: Shared queries, similar to existing shared queries. These queries are public for everyone.
  3. topqueries: Admin queries that were pinned to the top of the DQT. Similar to shared queries, but pinned by an admin who think they're important.

I believe the rest should be explained by the swagger schema descriptions

@driusan driusan added the Area: API PR or issue related to the API label Nov 7, 2022
@driusan
Copy link
Collaborator Author

driusan commented Dec 13, 2022

I'm planning on sending a PR with the php-side implementation of this to make it easier to test, but it's going to be blocked by #8260 and #8216 (or at least one other module implementing the interface..)

This defines the schema I've been using to design a QueryEngine
(rather than CouchDB) based data query tool.
modules/dataquery/static/schema.yml Show resolved Hide resolved
modules/dataquery/static/schema.yml Show resolved Hide resolved
modules/dataquery/static/schema.yml Show resolved Hide resolved
@xlecours
Copy link
Contributor

xlecours commented Dec 14, 2022

Following my comments, I think the recent, shared and top "tags" are concerns of the client and should not orient the response body shape. /queries should return all the queries accessible by the requesting user.

That said, Shared, Stared and potentially Pinned are properties that the backend should include in the responses.

@driusan
Copy link
Collaborator Author

driusan commented Dec 16, 2022

@xlecours I think I've incorporated all the changes you requested but haven't had a chance to update the backend yet.

I didn't change the wording of the endpoint "queries" to say "all" because that might not scale if we get to a point where the recent queries displayed need to be limited.

style: simple
schema:
type: integer
get:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think post would make more sense. This is creating a queryRun so I don't think it is safe (as in rfc2616 safe)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning 201 Created with location header set to /queries/{QueryID}/run/{QueryRunID} would be good as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you denote "Location header" in swagger?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually that can't be done right now because that end point is not implemented so it would mean it's impossible to get results. Once it is implemented, the schema could be updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    '201':
      description: The visit was created successfully
      headers:
        Content-Location:
          description: The URL of the new visit.
          style: simple
          explode: false
          schema:
            type: string

@xlecours
Copy link
Contributor

xlecours commented Dec 16, 2022

I didn't change the wording of the endpoint "queries" to say "all" because that might not scale if we get to a point where the recent queries displayed need to be limited.

I suggest you add /queryRun endpoint to get the run out of /queries. If there is too many run then implementing pagination would be an option then.

GET /queries
{
  "Queries": []
}

GET /queryRun
{
  "QueryRun": []
}

This describes what is POSTed and also goes directly into a subfield of
a Query object, not directly into the query.
@driusan
Copy link
Collaborator Author

driusan commented Dec 16, 2022

I don't think a QueryRuns endpoint wouldn't solve anything related to performance. Checking the access is expensive because it needs to go through every field and module of the query's fields and criteria. Sending it over the wire is cheap. A separate endpoint (or pagination) would just add overhead to loading the list.

@xlecours
Copy link
Contributor

xlecours commented Dec 16, 2022

I don't think a QueryRuns endpoint wouldn't solve anything related to performance.

It would remove the burden of loading the runs along the queries when the client request the list of existing queries.

@driusan
Copy link
Collaborator Author

driusan commented Dec 16, 2022

Sending query runs isn't a burden. It's sending a small amount of data directly from the database over the wire. Checking if a query is accessible by a user is slow, however, so, "all queries accessible by the user" is much slower than "queries which there is some reason to send to the user". The performance hit comes from the access check for queries, not the inclusion of a query run list.

@xlecours
Copy link
Contributor

I don't know about the implementation.
Getting the queries and the queryRun must take A + B microseconds. Getting only the queries can't take longer than getting both.

@driusan
Copy link
Collaborator Author

driusan commented Dec 16, 2022

Getting all queries accessible by the user takes n*accesschecktime seconds. There is no reason to include queries other users have run in the queries list just because they happen to have access to everything used in the query. That increases it to n*accesschecktime*numusers (roughly) seconds. The user only needs a subset of queries. The ones they've run, the ones that have been shared because they're interesting, the ones that an admin pinned, etc, not all queries on the server. Including all accessible queries makes the response unuseably slow. Including the times that they've been run has negligible impact on the performance.

@xlecours
Copy link
Contributor

Ok I get this. But I don't see how including queryRun in the results of /queries would solve any of that.

@driusan
Copy link
Collaborator Author

driusan commented Dec 16, 2022

It doesn't. Leaving the description "Get a list of a recent, shared, and study (top) queries to display." instead of "All queries accessible to the user" solves that.

Thinking about it a little more, I see how pagination could be useful for run queries. I'll separate them out on Monday.

@driusan
Copy link
Collaborator Author

driusan commented Dec 19, 2022

@xlecours the queries and runs are split into two different endpoints now.

Copy link
Contributor

@xlecours xlecours left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!
The only thing I wasn'T sure about is QueryRun having a link to the Query. At this point, it will be a suggestion, not a change request.
I have eager to see the implementation.

@driusan
Copy link
Collaborator Author

driusan commented Dec 19, 2022

Added QueryURI to the QueryRun object.

@driusan
Copy link
Collaborator Author

driusan commented Dec 19, 2022

@xlecours the implementation is already in #8268, just updated it to the latest changes..

@driusan driusan closed this Dec 19, 2022
@driusan driusan reopened this Dec 19, 2022
@driusan
Copy link
Collaborator Author

driusan commented Dec 19, 2022

@maltheism you reviewed an earlier draft. Are you okay with this version of the schema?

@driusan driusan merged commit 019db73 into aces:main Dec 19, 2022
@ridz1208 ridz1208 added this to the 25.0.0 milestone Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: API PR or issue related to the API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants