-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a way to export the query execution plan #278
Create a way to export the query execution plan #278
Comments
assigning this to @tokebe and @marcodarko. Can start with @tokebe to look into what's needed at |
@newgene @marcodarko On the |
My understanding is that the algorithm actually works (decides the order of execution) using the number of IDs in each step (as nodes get populated after one-hops)..... It would change the algorithm to do it based off of the number of operations possible between hops (w/o considering the number of IDs?). |
definitely related to this biothings/bte_trapi_query_graph_handler#72 |
Relevant code:
|
@tokebe was right, _edgeSupported already sort of does a dry run by running through the query edges and getting the smartapi edges. Using this, I logged info about each edge as well as the individual queries made. @newgene @colleenXu @tokebe Is this the format we are looking for/is there any other info that should be getting logged?
|
@ericz1803 Just to clarify, the logs will list all APIs queried for a query edge, and then for each of those APIs list the specific sub-query edge? This sounds pretty good to me. @newgene and @colleenXu may have other specific intentions/requirements. I definitely think using the existing TRAPI log format is the best way to do this as it keeps our response nice and TRAPI-compliant. I'd suggest changing the log levels to |
In the example above, it looks like all 181 sub-queries will have a log describing the semantics of the operation it's doing....which feels like too much. Some thoughts:
|
Just to respond to what I can here:
|
Addressing a few things:
@colleenXu Yup, just to clarify, it doesn't actually query it, it just gets the APIs for each edge.
I only included the log for 1 of them, but for each edge, it logs every sub-query it will make as well
This message is still logged (I just didn't include it in the snippet)
Will do for sure
@tokebe Do you know if the logging system right now has levels of verbosity? I could put the execution summary at the
|
@ericz1803 I'm a bit confused. Comparing my comment to this #278 (comment), my perspective is:
|
Regarding edge execution order and reversed status: As an example, |
Just made a pull request based on these comments. Some things that might be worth discussing still:
|
@ericz1803 Regarding your first point, log levels in BTE match the TRAPI standard defined here. The log level is selected by the first argument when creating a Regarding your second point, the most important place to add changes is |
Generally, yes, smartapi metadata is the place to document end-user facing query endpoints and parameters. It's typically a manual edit, which is fine for us, since the changes to its metadata are not too often. When the feature is stable and deployed, we can leave a comment here or create a new issue to update the metadata. @colleenXu handles many smartapi metadata updates, so she may chime in on the optimal process for our team as well. |
My understanding is that we currently have to manually edit the smartapi.yaml file in BTE main repo (used to validate what we send to the endpoints) and in the registry repo (used by the Smartapi registry, less important, just for documentation purposes...). We've previously talked about only keeping 1 file and removing the other one... |
Also, I know @ericz1803 and @tokebe discussed changing the endpoint to "Explain", given the opening post here by @newgene .... But could we instead use something different like "edgeplan" or "dryrun"? Since I feel like "explain" is used often to mean "explain-type" querying (IDs on QNodes, attached together by QGraph). Related to #379 |
@tokebe I made the changes for "dryrun" -> "explain" and changing the logging level to "DEBUG". I second @colleenXu's point that "explain" could lead to some confusion. For the |
@ericz1803 Apologies for the redundant work, given @colleenXu's comment As for |
All good, I made the change back to |
noting (this was originally sent in Slack): I've reviewed these PRs for dryrun....and I think the dryrun parameter isn't working correctly with the asyncquery endpoints - it looks to me that the query is executing completely rather than stopping early (only logging the query plan). It looks like it's working as-expected for the /query endpoints. But... I suggest: check with Chunlei on whether the current behavior is okay (only works on /query endpoints, only gives a "possible" query execution plan rather than the actual one) since he's the one who originally opened this issue |
from discussion today: the dryrun plan will accurately says the first edge to be executed, and it assumes that results/output IDs are found for each edge. This means that the order of the edges isn't necessarily correct after the first edge listed:
|
Similar to
explain
option for many database server, can we create an optionalexplain
parameter, which will only return the planned API request execution plan, without retrieving the API response.Before we actually implement this, we need to check if the current logs output can already serve the purpose (but I personally find very hard to navigate the entire log msgs).
It may be a bit complicated with the new iterative query execution algorithm, but the execution plan can be made without going through that process, just output the possible API requests by each edges (without specifying the actual execution order).
The text was updated successfully, but these errors were encountered: