-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SuppKG API YAML #122
SuppKG API YAML #122
Conversation
I know this post is long and kinda intimidating >.<. I think you've done a good job overall (great attention to detail!). I'll summarize the feedback as:
Addressing the issues you raised
cURL from Postman
Minor yaml suggestions
Feedback on the current operations
But regarding the operationsEDIT: @andrewsu and I have decided that this is a good next step. With this resource, I think we'll need to write more specific operations, based on the set of unique combos of
Then, depending on how many unique combos there are, we could then decide whether we want to map to biolink-model / write operations manually or through code (like what we do with semmeddb). Here's an example of what I think the format for operations would be (I've worked through it and tested it): the x-bte operations and response-mapping section
Example response from testing: suppkg.txt notes:
|
well...now I'm done editing my comment >.<. Hopefully this makes it easier to digest |
@colleenXu Thank you for the feedback! I've updated the yaml with your suggestions, and replaced the operations section with what you wrote.
Regarding the above, I can get counts for the predicates and how many subjects/objects have multiple semtypes. |
@mnarayan1 (CC @andrewsu ) I'd like to check in: how is the analysis of the data's predicates/semtypes going? or being able to test YAMLs locally? |
@colleenXu Sorry for the late response, I was out of town. I fixed the issue with my local installation of BTE, and I am able to test the yaml now. Here is the analysis I've gotten on the data.
Is there any other information I should get? |
Based on your info, it sounds like:
I think it would be helpful to have more specific info: A) Do you know what exact B) Is it possible to generate a table containing counts of how many records there are for each unique combo of
What would be most helpful are exact matches: so C) I see a relation.conf field in the records. Do we have a sense of the distribution of this value? A range would be helpful, or something like this My brainstormingThis KP is very similar to semmeddb...which is problematic because semmeddb has thousands of operations and requires a TON of special processing (pmid count, semtype/domain-predicate/range-predicate exclusions, novelty, etc.). My tentative ideas are:
|
Err...and the table from B) may be way too large for a github comment. A csv / tsv file may be the best way to share this table (along with a jupyter notebook or google colab notebook of the data analysis you're doing and how you're generating the table). |
Here is the notebook where I've done my work. It has a list of semtypes that could correspond to supplements, distribution of relation.conf values, and code used to generate the table of meta-triples. A) There doesn't seem to be anywhere in SuppKG that explicitly states whether or not something is a dietary supplement. However, I looked through this list (containing all 133 UMLS semantic types) and compiled a list of semtypes that could possibly correspond to a supplement (excluding objects, body parts, diseases, etc.) B) Here is the csv file with unique triples and their counts. C) The distribution of |
So while there are many metatriples in suppkg, we are really only interested in the ones that directly relate to supplements. So if you took your list of possible semantic types associated with supplements from your notebook, can you redo the analysis showing the counts of each metatriple in this csv? |
Here are the counts of metatriples with only supplements. |
Hmm, that still results in a huge list of metatriples. So let's change gears a little bit. Rather than trying to come up with exclusion filters to remove what we don't want, let's instead focus on defining a small set of inclusion filters for triples that we do want. For this resource, the most unique thing we get are for
I would take the union of all the subject types, and see if you can create a smartAPI operation (or a set of operations) to retrieve those triples specifically. Does that make sense? |
@andrewsu @colleenXu I've finished writing the operations to retrieve the above triples. I've tested them out on my local BTE instance, and the queries for each triple type seem to work (I included the testExamples in the yaml). Is there anything else I should add? |
Suggested major edits:I think it'll be simpler and more elegant to have 2 operationsOne for
The other for
adjust response-mapping
The final response-mapping may look something like this:
change the parameter.fields to match the response-mappingFor the two operations, the parameter.fields can be changed since we'll only need the fields that are referenced in the response-mapping. So something like this could work for the supplement-treats-disease operation (object.umls contains the output): Minor editsclick here to expand
|
This API seems to still have "fake" This was previously brought up starting here and the comments below it all seem relevant. |
@colleenXu Let's go ahead and allow these "fake UMLS IDs" to be returned. Presumably, NodeNormalizer will fail to resolve these, and BTE will use the original names from SuppKG as the human-readable names for presentation in the ARAX UI and Translator UI. At least that's how I think it will work -- let's see how it works in practice... @mnarayan1 let us know when you have the updates done from @colleenXu's suggestions above... |
@andrewsu @colleenXu I've finished with the edits, and the testing is still working for me. |
I'm going to merge this PR, since the yaml looks ready. Good job @mnarayan1! We'll continue discussion and next steps in biothings/biothings_explorer#706 |
YAML for the SuppKG API. The API is located here.
Notes:
NamedThing
for thesemantic
field of thex-bte-operations
section. Is there something more specific I should use instead?I've been trying to test my yaml file with this query:
Here is my
smartapi_overrides.json
file:However, I'm getting this error:
{"error":"Your input query graph is invalid","more_info":"Your Input Query Graph is invalid."}
Are there any issues with my annotations? Should I format my query differently?