Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Multiomics EHR Risk KP Update #113

Closed
GitHubbit opened this issue Apr 24, 2023 · 7 comments
Closed

API Multiomics EHR Risk KP Update #113

GitHubbit opened this issue Apr 24, 2023 · 7 comments
Assignees

Comments

@GitHubbit
Copy link

GitHubbit commented Apr 24, 2023

  • Hadlock lab's repo (https://github.com/Hadlock-Lab/clinical_risk_kp/pulls)
  • API URL: https://biothings.ncats.io/multiomics_ehr_risk_kp
  • Github URL: https://github.com/Hadlock-Lab/clinical_risk_kp
  • Git Branch/Commit: Master 223f236
  • No. Documents: same as previous (46,435) Updating parser to comply with TRAPI 1.4 Update (https://github.com/uhbrar/ReasonerAPI/blob/update_guide/MigrationAndImplementationGuide1-4.md)
  • Structure of Documents (one record):
    { "_id": "UNII:25ADE2236L_HP:0000360_08401321539277617_08401321539277617", "subject": { "UNII": "25ADE2236L", "id": "UNII:25ADE2236L", "name": "thrombin", "type": "biolink:ChemicalSubstance" }, "association": { "predicate": "associated_with_increased_likelihood_of", "edge_attributes": [ { "attribute_type_id": "biolink:has_supporting_study_result", "value": "We train a large collection of multivariable, binary logistic regression models on EHR data for each specific condition/disease/outcome. Features include labs, medications, and phenotypes. Directed edges point from risk factors to specific outcomes (diseases, phenotype, or medication exposure).", "attributes": [ { "attribute_type_id": "biolink:supporting_study_method_type", "value": "STATO:0000149", "description": "Binomial logistic regression for analysis of dichotomous dependent variable (in this case, for having this particular condition/disease/outcome or not)" }, { "attribute_type_id": "biolink:update_date", "value": "2022-05-18" }, { "attribute_type_id": "biolink:p_value", "value": 0.9367666401584368, "description": "p-value for the feature's coefficient" }, { "attribute_type_id": "STATO:0000209", "value": 0.8401321539277617, "description": "AUC-ROC of the logistic regression model" }, { "attribute_type_id": "STATO:0000565", "value": 4.558176672832635, "description": "log_odds_ratio" }, { "attribute_type_id": "biolink:supporting_study_cohort", "value": "age < 18 excluded" }, { "attribute_type_id": "biolink:supporting_study_date_range", "value": "2020-2022 (future prediction)" }, { "attribute_type_id": "biolink:supporting_study_size", "value": "10100000", "description": "total_sample_size" } ] }, { "attribute_type_id": "biolink:primary_knowledge_source", "value": "infores:biothings-multiomics-ehr-risk", "value_type_id": "biolink:InformationResource", "value_url": "http://smart-api.info/registry?q=d86a24f6027ffe778f84ba10a7a1861a", "description": "The EHR Risk KP is created and maintained by the Multiomics Provider team from the Institute for Systems Biology in Seattle, WA. Through a partnership with Providence/Swedish Health Services and Institute for Systems Biology, we analyze over 26 million EHRs. We use these records to train a large collection of interpretable machine learning models which are integrated into a single large Knowledge Graph, with directed edges pointing from risk factors to specific outcomes (diseases, phenotype, or medication exposure)." }, { "attribute_type_id": "biolink:supporting_data_source", "value": "infores:providence-st-joseph-ehr", "value_type_id": "biolink:InformationResource", "value_url": "https://github.com/NCATSTranslator/Translator-All/wiki/EHR-Risk-KP", "description": "A partnership with Providence/Swedish Health Services and Institute for Systems Biology allows analysis of 26 million EHRs from patients in seven states in the US, including Alaska, California, Montana, Oregon, Washington, Texas, and New Mexico. Please email data-access@isbscience.org for more information." } ] }, "object": { "HP": "0000360", "id": "HP:0000360", "name": "Tinnitus", "type": "biolink:PhenotypicFeature" }, "source": { "edge_sources": [ { "resource_id": "infores:biothings-multiomics-ehr-risk", "resource_role": "primary_knowledge_source" }, { "resource_id": "infores:providence-st-joseph-ehr", "resource_role": "supporting_data_source" } ] } }
@erikyao erikyao self-assigned this Apr 26, 2023
@GitHubbit GitHubbit reopened this May 1, 2023
@GitHubbit
Copy link
Author

Hello!
Sorry for confusion. We are actually in the process of updating our TSVs for TCDC's MVP2, so I would request that we temporarily pause the API update until that is resolved.

@erikyao
Copy link
Contributor

erikyao commented May 1, 2023

@GitHubbit thank you for the information!

@GitHubbit
Copy link
Author

@erikyao Hello Yao! Apologies for the confusion. Earlier, I had requested that we pause this update because I thought that we were going to provide new TSVs. We have decided that we need to do a significant amount of work before we are able to generate new TSVs. Given the urgency of TRAPI 1.4 migration requirements, may I ask that you deploy this as before? The TSVs are thus what we previously used, and the only major change is the addition of the source provenance. Please let me know!

@erikyao
Copy link
Contributor

erikyao commented May 15, 2023

@GitHubbit @colleenXu API updated!

@colleenXu
Copy link

@GitHubbit

Heads-up:

(1) BTE probably isn't hooked up to this API properly due to the changes. To get BTE hooked up properly, the currently registered SmartAPI yaml needs updates. I can meet with you if needed to go through some examples and get you started.

The updates that I can think of off the top of my head are...

  • changing how edge-attributes are treated (in the parameter.fields and x-bte-response-mapping) so BTE will ingest them properly
  • changing query-annotation that uses ChemicalSubstance (in the requestBody section) to use ChemicalEntity instead. Related to this Changed biolink category that has been deprecated Hadlock-Lab/clinical_risk_kp#31
  • compare the data to the list of operations: are all the subject-type/subject-id/predicate/qualifiers/object-type/object-id combos covered? Are there operations any that can be removed because there isn't much data / any data that matches them?

(2) Once (1) is done, you'll probably also want the TRAPI 1.4 source data stuff working correctly. To support this, similar to the ClinicalTrials KP API, a separate yaml will need to be created (in a branch or fork) with the trapi_sources x-bte-response-mapping...

@erikyao
Copy link
Contributor

erikyao commented May 19, 2023

@GitHubbit @colleenXu API updated again

@colleenXu
Copy link

It looks like the basics of this issue were addressed:

  • the API was updated in May 2023
  • the x-bte annotation was updated in June 2023 to work with this updated API
  • However, there may still be data in the API that isn't fully covered by the x-bte operations. We never completed a script to automatically generate this annotation from the data going into this API or a review of the data to ensure the x-bte operations can access all of it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants