investigate addition of supporting sentence to semmeddb API #563

andrewsu · 2023-02-14T16:46:06Z

SemMedDB is a text-mined resource for extracting relationships (triples) from the literature. The schema for SemMedDB is described at https://lhncbc.nlm.nih.gov/ii/tools/SemRep_SemMedDB_SKR/dbinfo.html. Our current semmeddb API (http://biothings.ncats.io/semmeddb; parser: https://github.com/biothings/semmeddb) primarily focuses on the "PREDICATIONS" table. The Translator consortium would like to explore the addition of the actual sentence used to infer the triple, found in the "SENTENCE" table. I remember that we briefly explored this previously, and the substantial increase in size was a key consideration. (The PREDICATION file is 3 GB; the SENTENCE file is 15 GB.)

colleenXu · 2023-02-21T22:42:14Z

Some previous discussion of "sentence" info:

Asking for it, and an example: update SemMedDB APIs pending.api#30 (comment)
Decision recorded not to add: update SemMedDB APIs pending.api#30 (comment)

erikyao · 2023-02-23T22:28:24Z

We can load the SENTENCE table and join the SENTENCE records to our documents by SENTENCE_ID (as shown in the entity-relationship diagram at the bottom).

Currently our PREDICATION parser discards the SENTENCE_ID field since it's never used.

andrewsu · 2023-02-23T22:43:14Z

@erikyao are you at all worried about the explosion in index size that would result?

erikyao · 2023-02-23T22:48:26Z

are you at all worried about the explosion in index size that would result?

@andrewsu I think the additional SENTENCE field(s) won't take too much index size.

I am more worried about the memory usage when loading the SENTENCE table... But we can always preprocess it and extract smaller intermediate files if the memory usage became a real problem.

andrewsu · 2023-02-24T01:35:12Z

Super, thanks. Let's wait until we make a decision on #569 (next week) so we can possibly make both changes together.

…names and directories

andrewsu · 2023-06-02T04:08:27Z

sentence context has now been added to the semmeddb2 API (which will soon replace the semmeddb API), so closing this issue

https://biothings.ncats.io/semmeddb2/association/C0007642-ISA-C0410013

{
  "_id": "C0007642-ISA-C0410013",
  "_version": 1,
  "object": {
    "name": "Soft tissue lesion",
    "novelty": 1,
    "semantic_type_abbreviation": "patf",
    "semantic_type_name": "Pathologic Function",
    "umls": "C0410013"
  },
  "pmid_count": 1,
  "predicate": "ISA",
  "predication": [
    {
      "object_score": 906,
      "object_text": "soft tissue lesions",
      "pmid": 8455912,
      "predication_id": 88510072,
      "sentence": "Definitive diagnoses were 15 osteomyelitis, 14 soft tissue lesions (nine cellulitis and five noninfected ischaemic or trophic wounds), and nine degenerative bone disease.",
      "sentence_id": 49832114,
      "subject_score": 888,
      "subject_text": "cellulitis"
    }
  ],
  "predication_count": 1,
  "subject": {
    "name": "Cellulitis",
    "novelty": 1,
    "semantic_type_abbreviation": "dsyn",
    "semantic_type_name": "Disease or Syndrome",
    "umls": "C0007642"
  }
}

colleenXu · 2023-06-02T04:49:29Z

Here's where I noted that I added sentence support to the x-bte annotations #569 (comment)

andrewsu mentioned this issue Feb 23, 2023

investigate refactoring document structure in semmeddb API #569

Closed

andrewsu assigned erikyao Mar 2, 2023

andrewsu mentioned this issue Mar 2, 2023

adjust SemMedDB SmartAPI annotation to comply with new NLP metadata standard #570

Closed

erikyao added a commit to biothings/semmeddb that referenced this issue Mar 16, 2023

initial fix to biothings/biothings_explorer#563; update the data file…

cfe00ec

…names and directories

andrewsu closed this as completed Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

investigate addition of supporting sentence to semmeddb API #563

investigate addition of supporting sentence to semmeddb API #563

andrewsu commented Feb 14, 2023

colleenXu commented Feb 21, 2023

erikyao commented Feb 23, 2023

andrewsu commented Feb 23, 2023

erikyao commented Feb 23, 2023 •

edited

Loading

andrewsu commented Feb 24, 2023

andrewsu commented Jun 2, 2023

colleenXu commented Jun 2, 2023

investigate addition of supporting sentence to semmeddb API #563

investigate addition of supporting sentence to semmeddb API #563

Comments

andrewsu commented Feb 14, 2023

colleenXu commented Feb 21, 2023

erikyao commented Feb 23, 2023

andrewsu commented Feb 23, 2023

erikyao commented Feb 23, 2023 • edited Loading

andrewsu commented Feb 24, 2023

andrewsu commented Jun 2, 2023

colleenXu commented Jun 2, 2023

erikyao commented Feb 23, 2023 •

edited

Loading