Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conflate parameter of Node Normalizer #7

Closed
erikyao opened this issue Jan 4, 2023 · 1 comment
Closed

conflate parameter of Node Normalizer #7

erikyao opened this issue Jan 4, 2023 · 1 comment

Comments

@erikyao
Copy link
Collaborator

erikyao commented Jan 4, 2023

Original Purpose: gene-gene equivalence detection

Previously we decided to leverage Node Normalizer to find equivalent NCBI Gene IDs for those gene-presenting CUIs. A predication whose subject or object is an equivalent NCBI Gene ID is considered redundant to the CUI and thus should be deleted by the parser script.

E.g. after the piped CUI C1418660|5361 are separated into 2 rows, there are two predications:

row-id PREDICATION_ID PMID PREDICATE SUBJECT_CUI SUBJECT_NAME SUBJECT_SEMTYPE OBJECT_CUI OBJECT_NAME OBJECT_SEMTYPE
69865 14008146 16541019 INHIBITS C1418660 PLXNA1 gene gngm C1418661 PLXNA2 gene gngm
69866 14008146 16541019 INHIBITS 5361 PLXNA1 gngm C1418661 PLXNA2 gene gngm

5361 is equivalent to C1418660 so the second predication can be deleted.

Move Further: protein-gene equivalence detection when conflate is true

Passing {"conflate": true} to the Node Normalizer means "asking the endpoint to return conflated data" (currently only Gene-Protein conflation). See Babel output formats >> Conflation.

We do have such protein-gene data in the SemMedDB predications, e.g.:

row-id PREDICATION_ID PMID PREDICATE SUBJECT_CUI SUBJECT_NAME SUBJECT_SEMTYPE OBJECT_CUI OBJECT_NAME OBJECT_SEMTYPE
64933 10603013 16530496 ASSOCIATED_WITH C0020063 PTH protein, human aapp C0029463 osteosarcoma neop
64934 10603013 16530496 ASSOCIATED_WITH 5741 PTH aapp C0029463 osteosarcoma neop

Node Normalizer with {"conflate": true} is able to report the equivalence between C0020063 and 5741

QUESTION: Shall we enable conflate to delete such redundant predications (like the second row above)?

Outlaws: peptide-gene equivalence?

E.g.

row-id PREDICATION_ID PMID PREDICATE SUBJECT_CUI SUBJECT_NAME SUBJECT_SEMTYPE OBJECT_CUI OBJECT_NAME OBJECT_SEMTYPE
64923 10597756 16530483 INTERACTS_WITH C0027893 neuropeptide Y gngm C0039194 T-Lymphocyte cell
64924 10597756 16530483 INTERACTS_WITH 4852 NPY gngm C0039194 T-Lymphocyte cell

Node normalizer CANNOT report the equivalence between C0027893 and 4852.

QUESTION: Will this be a trouble for BTE?

@erikyao
Copy link
Collaborator Author

erikyao commented Jan 4, 2023

Jan 4th meeting, Colleen's input:

  • protein-gene: let's do {"conflate": true}
  • peptide-gene: we don't have to report their equivalence for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant