-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load HMDB data for protein-associated metabolites #110
Comments
Let's create this as a standalone "pending" API for now. Also, let's create this as an "association"-style API, where each document describes a triple (subject/object/predicate). This aligns with how we structured the semmeddb API described in this comment biothings/pending.api#30 (comment). |
Working example of the association structure for metabolite HMDB data below... For an example of the data file and one protein, see here. This is what I am currently extracting from to get the structure below. The protein data is in a nested Currently I am adding the Below is a clean working version of the association structure of the I hope this is clear. @colleenXu and @newgene , if you could view the structure below and let me know if there are any details to modify or add. As well, @andrewsu mentioned the HMDB ID for metabolites might not be used within Translator and that since HMDB has already done mappings to other database identifiers (e.g., https://hmdb.ca/metabolites/HMDB0015122#links), I should include those in the object dict. These links are not in the proteins file, so I am looking for a file to extract those from.
|
@andrewsu @NikkiBytes A few questions after looking over the XML example vs the website:
considerations
Notes on modeling for translator / biolink:
|
A few followups to @colleenXu's reply, hitting the bullet points in order:
@NikkiBytes for now, go ahead and move forward after making the changes described in the first two bullet points above. |
Example of the newly edited structure, was able to pull the
|
Here is an example output of a single record generated with the parser....
I think it addresses all the details mentioned . Note: some records differ only in When running my parser on BioThings Hub the dumper is successful, but the uploader is running into this problem: Links to reference files: repo, parser file,manifest file @colleenXu have you seen this error before? or is there something obviously wrong with the files, etc? I have been able to solve all errors up to this point, I have a few ideas of what this could be, but any feedback is appreciated, thank you! When this is solved its ready for the next steps. |
Can you paste the logs and stack trace here? |
@NikkiBytes please follow up with @zcqian . I am not involved in the process of actually uploading / creating APIs... |
Thank you @zcqian , the logs ....
|
A few notes/updates on the parser.....
Document Structure Example
|
Looking at the thread above, looks like this data plugin is ready for deployment as a pending API... Assigning to @erikyao to evaluate... |
API published, https://biothings.ncats.io/hmdb |
Related infores stuff is ready:
|
Going to close this issue and open another one for the SmartAPI yaml w/ x-bte annotation writing |
"The Human Metabolome Database (HMDB) is a freely available electronic database containing detailed information about small molecule metabolites found in the human body" (from https://hmdb.ca/). HMDB contains links between proteins and the metabolites they are associated with. For example, the HMDB record for Homogentisate 1,2-dioxygenase (UniProtKB:Q93099) is HMDBP00842, and https://hmdb.ca/proteins/HMDBP00842/metabolite_protein_links shows the metabolites associated with this protein. These relationships can also be downloaded from the HMDB downloads page, and specifically the "All proteins" file.
This issue tracks the loading of these protein-associated metabolites to mygene.info.
Related to NCATSTranslator/testing#49
The text was updated successfully, but these errors were encountered: