Create API for SuppKG (Dietary Supplements) #55

andrewsu · 2022-02-10T18:13:23Z

SuppKG contains a variety of edges for Dietary Supplements.

Publication: https://pubmed.ncbi.nlm.nih.gov/35709900/
Preprint: https://arxiv.org/abs/2106.12741
Download link: https://github.com/zhang-informatics/SemRep_DS/tree/main/SuppKG

There are 595222 entries under the links. Here is one example record:

        {
            "relations": [
                {
                    "pmid": 1394115,
                    "sentence": "Turmeric and curcumin were also found to reverse the aflatoxin induced liver damage produced by feeding aflatoxin B1 (AFB1) (5 micrograms/day per 14 days) to ducklings.",
                    "conf": 0.9303833842,
                    "tuid": 0
                },
                {
                    "pmid": 1394115,
                    "sentence": "Reversal of aflatoxin induced liver damage by turmeric and curcumin.",
                    "conf": 0.9396179318000001,
                    "tuid": 0
                }
            ],
            "source": "C0001734",
            "target": "C0151763",
            "key": "CAUSES"
        },

I believe we want to create a record like this (where the info for name can be found in the nodes section of the json).

{
    "_id": "C0001734_C0151763_CAUSES",
    "subject": {
        "umls": "C0001734",
        "name": "aflatoxin",
        "semtypes": [ "bacs", "hops"]
    },
    "relation": [
        {
            "pmid": 1394115,
            "sentence": "Turmeric and curcumin were also found to reverse the aflatoxin induced liver damage produced by feeding aflatoxin B1 (AFB1) (5 micrograms/day per 14 days) to ducklings.",
            "conf": 0.9303833842,
            "tuid": 0
        },
        {
            "pmid": 1394115,
            "sentence": "Reversal of aflatoxin induced liver damage by turmeric and curcumin.",
            "conf": 0.9396179318000001,
            "tuid": 0
        }
    ],
    "object": {
        "umls": "C0151763",
        "name": "damage liver",
        "semtypes": [ "patf" ]
    },
    "predicate": "CAUSES"
}

The text was updated successfully, but these errors were encountered:

colleenXu · 2022-05-24T04:47:06Z

Pasted from Slack, my notes after reviewing the output file from:

an open source contributor created this parser https://github.com/mnarayan1/suppkg-data/blob/main/parser.py to address this ticket #55. The sample output file is at https://drive.google.com/file/d/1qsPvQre8E4Cz0JqvLR44A8vMuJz57VAq/view?usp=sharing

I think the structure is okay for writing queries with x-bte annotation.
But....

Point 0: I wonder if the relation array ever gets a LOT of elements

Point 1: Looking at the output file, some umls IDs seem to start with "DC" which seems incorrect. It looks like the "D" should be removed, so the ID starts with "C". Examples:

DC1029148 (from the idx 1 record)
DC0016163 (from the idx 7 record)

Point 2: Looking at the output file, some IDs don't seem to match their names. Examples:

the idx 2 record has object.name as "aceite niauli". I'm not sure what that means. The object.umls is DC0028908. After removing the "D" (see point 1), this ID corresponds with "oils".
the idx 6 record has object.name as "genotoxins". However, the corresponding ID's official name is mutagens (genotoxins does show up as an "atom" underneath, likely a cross-mapped ID).

Point 3: Looking at the output file, some semantic types don't exist or don't seem to match the ID given
I'm seeming "dsp" in object.semtypes (idx 2, idx 7 records) and this isn't a UMLS abbreviation (they're always 4 letters)

the idx 7 record has object.umls: DC0016163, which seems to refer to "fishes". However, the object.semtypes are imft (Immunologic Factor) and "dsp". Which seems odd.
the idx 9 record has DC1140671, which seems to refer to "rice / Oryza sativa". However, its semtypes are orch (organic chemical) and phsu (pharmacological substance). Again, odd.

andrewsu · 2022-05-24T15:30:08Z

I think this has to do with the fact that suppKG apparently is using a (very) old version of UMLS. From their preprint:

This may mean that we should perform some of the same analyses/filtering as we did for semmeddb, as described in biothings/semmeddb#2.

colleenXu · 2022-05-24T21:16:23Z

(from looking at the materials + method section of the preprint)

It sounds like the authors made some pseudo-UMLS IDs from "iDISK terms" that didn't map to an existing CUI....is that right? And that some of these "iDISK terms" were drug supplement ingredients...This makes me wonder about the MRCONSO.RRF file that they mention, which sound like it may have mappings from the original "iDISK terms" to pseudo-UMLS IDs used in their KG...

Also, it sounds like they put "phsu" as the semantic type for all drug supplements for their work, even if the original UMLS ID isn't considered a Pharmacological Substance. This makes me think of the plant terms (Point 3 / bullet 2 in my above post).

They also mention a networkx file and I wonder if that's useful...

erikyao · 2022-06-01T16:55:11Z

(from looking at the materials + method section of the preprint)

It sounds like the authors made some pseudo-UMLS IDs from "iDISK terms" that didn't map to an existing CUI....is that right? And that some of these "iDISK terms" were drug supplement ingredients...This makes me wonder about the MRCONSO.RRF file that they mention, which sound like it may have mappings from the original "iDISK terms" to pseudo-UMLS IDs used in their KG...

Also, it sounds like they put "phsu" as the semantic type for all drug supplements for their work, even if the original UMLS ID isn't considered a Pharmacological Substance. This makes me think of the plant terms (Point 3 / bullet 2 in my above post).

They also mention a networkx file and I wonder if that's useful...

Hi @colleenXu , from SemRep_DS/docs/SemRep_full_fielded_output.txt:

*_CUI: The CUI of the subject/object entity. If a CUI starts with
'DC' instead of just 'C' it is an iDISK CUI and is not present in the UMLS.

andrewsu · 2022-06-01T22:15:34Z

@erikyao deployed the API at https://biothings.ncats.io/suppkg based on the parser written by @mnarayan1 (https://github.com/biothings/SuppKG). @colleenXu can you add creation of the smartAPI annotation to your to-do list please? ("Normal" priority -- no special urgency here...)

Let's also leave this ticket open for the moment so we can contemplate enhancements to the parser (for example, to handle retired UMLS IDs, get more current human-readable names and semtypes, etc)...

colleenXu · 2023-06-16T23:42:33Z

Related to NCATS-Tangerine/translator-api-registry#122 by @mnarayan1

colleenXu · 2023-08-25T04:14:15Z

Closing because the API has been made. The rest of the work and discussion can be moved to biothings/biothings_explorer#706

andrewsu added the data source Data source pending to create a new API label Feb 10, 2022

colleenXu mentioned this issue Feb 11, 2022

find/add non-Translator APIs to BTE biothings/biothings_explorer#372

Open

andrewsu assigned colleenXu Jun 1, 2022

andrewsu mentioned this issue Jun 24, 2022

Data source: BindingDB #70

Closed

andrewsu mentioned this issue Dec 16, 2022

Non-drug therapies, including sources like iDisk NCATSTranslator/Feedback#74

Closed

mnarayan1 mentioned this issue Jun 16, 2023

SuppKG API YAML NCATS-Tangerine/translator-api-registry#122

Merged

colleenXu mentioned this issue Aug 18, 2023

BioThings suppKG: parser, x-bte, adding to BTE biothings/biothings_explorer#706

Closed

colleenXu closed this as completed Aug 25, 2023

andrewsu mentioned this issue Jul 5, 2024

something strange with the UMLS Identifier dimethyl sulfoxide UMLS:DC0012403 NCATSTranslator/Feedback#836

Open

andrewsu mentioned this issue Jul 25, 2024

Modify SuppKG parser to better deal with fake UMLS IDs #220

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create API for SuppKG (Dietary Supplements) #55

Create API for SuppKG (Dietary Supplements) #55

andrewsu commented Feb 10, 2022 •

edited

Loading

colleenXu commented May 24, 2022 •

edited

Loading

andrewsu commented May 24, 2022

colleenXu commented May 24, 2022 •

edited

Loading

erikyao commented Jun 1, 2022

andrewsu commented Jun 1, 2022 •

edited

Loading

colleenXu commented Jun 16, 2023 •

edited

Loading

colleenXu commented Aug 25, 2023 •

edited

Loading

Create API for SuppKG (Dietary Supplements) #55

Create API for SuppKG (Dietary Supplements) #55

Comments

andrewsu commented Feb 10, 2022 • edited Loading

colleenXu commented May 24, 2022 • edited Loading

andrewsu commented May 24, 2022

colleenXu commented May 24, 2022 • edited Loading

erikyao commented Jun 1, 2022

andrewsu commented Jun 1, 2022 • edited Loading

colleenXu commented Jun 16, 2023 • edited Loading

colleenXu commented Aug 25, 2023 • edited Loading

andrewsu commented Feb 10, 2022 •

edited

Loading

colleenXu commented May 24, 2022 •

edited

Loading

colleenXu commented May 24, 2022 •

edited

Loading

andrewsu commented Jun 1, 2022 •

edited

Loading

colleenXu commented Jun 16, 2023 •

edited

Loading

colleenXu commented Aug 25, 2023 •

edited

Loading