Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge Botulinum toxin A nodes #395

Open
TranslatorIssueCreator opened this issue Jul 6, 2023 · 11 comments
Open

merge Botulinum toxin A nodes #395

TranslatorIssueCreator opened this issue Jul 6, 2023 · 11 comments
Assignees
Labels
chemical conflation Next Phase normalization relates to or otherwise handled by node norm or name resolver - separate from autocomplete SRI Tooling

Comments

@TranslatorIssueCreator
Copy link

Type: Bug Report

URL: https://ui.ci.transltr.io/results?l=Bethlem%20Myopathy&i=MONDO:0008029&t=0&q=1fb6945c-668b-4750-85f4-2daa53eb4596

ARS PK: 98ca4253-5d0e-4741-9ace-0e051a37c0c7

Steps to reproduce:

CI environment
MVP1 Bethlem disease

Screenshots:

@cbizon
Copy link
Collaborator

cbizon commented Jul 6, 2023

TranslatorSRI/Babel#164

@sandrine-m
Copy link

PK : 1fb6945c-668b-4750-85f4-2daa53eb4596
results on CI shows 2 results (result 1 and 10) with same compound
image
@gglusman : The two identifiers are http://identifiers.org/unii/E211KPY694 and http://identifiers.org/umls/C0006050

@sandrine-m
Copy link

Output of Name resolver

  "CHEMBL.COMPOUND:CHEMBL4297862": [
    "BOTULINUM TOXIN TYPE A"
  ],
  "UNII:E211KPY694": [
    "BOTULINUM TOXIN TYPE A",
    "[OBSOLETE] onabotulinumtoxinA"
  ],
  "MESH:D019274": [
    "Botulinum toxin type A",
    "Botulinum Toxins, Type A",
    "Botulin A"
  ],
  "UMLS:C0006050": [
    "botulinum toxin type a",
    "Botulinum toxin type A",
    "BOTULINUM TOXIN TYPE A",
    "botulinum toxin type A",
    "Botulinum Toxin Type A",
    "toxin botulinum type a",
    "Botulinum Toxins, Type A",
    "Clostridium Botulinum Toxin Type A",
    "Botulinum toxin type A (substance)",
    "Botulinum toxin type A-containing product",
    "Product containing botulinum toxin type A (medicinal product)",
    "BTX-A",
    "Xeomin",
    "BoNT-A",
    "Dysport",
    "dysport",
    "onaclostox",
    "Onaclostox",
    "AbobotulinumA",
    "Botulinum A Toxin",
    "Botulinum Toxin A",
    "botulinum toxin a",
    "botulinum toxin A",
    "botulinum a toxin",
    "Botulinum toxin A",
    "botulinum A toxin",
    "AbobotulinumtoxinA",
    "Onabotulinumtoxina",
    "Toxin, Botulinum A",
    "ABOBOTULINUMTOXINA",
    "OnabotulinumtoxinA",
    "EvabotulinumtoxinA",
    "abobotulinumtoxina",
    "Toxin A, Botulinum",
    "onabotulinumtoxinA",
    "abobotulinumtoxinA",
    "ONABOTULINUMTOXINA",
    "Prabotulinumtoxin A",
    "prabotulinumtoxin A",
    "DaxibotulinumtoxinA",
    "Toxina botulínica A",
    "INCOBOTULINUMTOXINA",
    "Onabotulinumtoxin A",
    "IncobotulinumtoxinA",
    "abobotulinumtoxin A",
    "incobotulinumtoxinA",
    "abobotulinum toxin A",
    "Toxine botulinique A",
    "botulinum toxin type",
    "Botulinum A neurotoxin",
    "Botulinum Neurotoxin A",
    "Neurotoxin A, Botulinum",
    "Botulinum antitoxin type A",
    "Botulinum Neurotoxin Type A",
    "botulinum neurotoxin type A",
    "Clostridium botulinum A Toxin",
    "Clostridium botulinum toxin A",
    "AbobotulinumtoxinA (substance)",
    "OnabotulinumtoxinA (substance)",
    "IncobotulinumtoxinA (substance)",
    "onabotulinumtoxinA (medication)",
    "abobotulinumtoxina (medication)",
    "IncobotulinumtoxinA (medication)",
    "OnabotulinumtoxinA-containing product",
    "AbobotulinumtoxinA-containing product",
    "IncobotulinumtoxinA-containing product",
    "Product containing onabotulinumtoxinA (medicinal product)",
    "Product containing abobotulinumtoxinA (medicinal product)",
    "neuromuscular blockers botulinum toxin incobotulinumtoxina",
    "Product containing incobotulinumtoxinA (medicinal product)"
  ],
  "UMLS:C5235585": [
    "Botulinum Toxin Type A5"
  ],
  "UMLS:C5235587": [
    "Botulinum Toxin Type A7"
  ],
  "UMLS:C5235584": [
    "Botulinum Toxin Type A4"
  ],
  "UMLS:C5235583": [
    "Botulinum Toxin Type A3"
  ],
  "UMLS:C5235586": [
    "Botulinum Toxin Type A6"
  ],
  "UMLS:C5235582": [
    "Botulinum Toxin Type A1"
  ]
}

@sandrine-m
Copy link

@cbizon : This is not a conflation issue but a normalization one.

@sierra-moxon
Copy link
Member

sierra-moxon commented Aug 11, 2023

from TAQA: two cliques for BTA - one has all the usual IDs, one has just UMLS (hard to map UMLS to the rest); move this to Fall because not an easy fix. Could be drug conflator is the issue here.

@gaurav
Copy link

gaurav commented Aug 11, 2023

This should be conflated by the Drug Conflator -- as you can see in https://nodenormalization-dev.apps.renci.org/1.4/get_normalized_nodes?curie=UMLS%3AC0006050&curie=UNII%3AE211KPY694&conflate=true&drug_chemical_conflate=true, UMLS:C0006050 is listed as an alternate ID for UNII:E211KPY694, and it's not clear why that isn't happening. I am investigating.

@sstemann
Copy link

@gaurav this is still any issue - who should this go to?

@sstemann sstemann removed this from the D: Fall - 2023 milestone Feb 14, 2024
@sierra-moxon sierra-moxon added the needs review this ticket needs a broad group of people to review and assign next steps because it crosses teams label May 17, 2024
@gaurav
Copy link

gaurav commented May 17, 2024

This is still on me. The problem is that UMLS:C0006050 is a Protein while UNII:E211KPY694 is a ChemicalEntity, which are handled separately in Babel and so they won't be combined as-is. I'm still thinking about how best to combine them, as I don't know any source of UNII-protein connections (TranslatorSRI/Babel#164).

I'm also annoyed that it is possible to have the same identifier in multiple cliques because of how NodeNorm's databases are designed, but that's out of scope for this issue and possibly for this year (TranslatorSRI/Babel#276).

@gaurav gaurav added normalization relates to or otherwise handled by node norm or name resolver - separate from autocomplete Guppy (Sprint 5) - due Aug 23 in CI This ticket will be fixed in CI by the end of Guppy (Sprint 5) (Aug 23) and removed needs review this ticket needs a broad group of people to review and assign next steps because it crosses teams labels Jul 12, 2024
@gaurav
Copy link

gaurav commented Jul 12, 2024

Without drug conflation, we now have 8 cliques:

  1. UNII:E211KPY694 "botulinum toxin type A" (ChemicalEntity, which now includes CHEMBL.COMPOUND:CHEMBL4297862 and MESH:D019274)
  2. UMLS:C0006050 "botulinum toxin type A" (Protein)
  3. UMLS:C5235582 "Botulinum Toxin Type A1" (Protein)
  4. UMLS:C5235583 "Botulinum Toxin Type A3" (Protein)
  5. UMLS:C5235584 "Botulinum Toxin Type A4" (Protein)
  6. UMLS:C5235585 "Botulinum Toxin Type A5" (Protein)
  7. UMLS:C5235586 "Botulinum Toxin Type A6" (Protein)
  8. UMLS:C5235587 "Botulinum Toxin Type A7" (Protein)

So we're definitely doing better, but we still have some UMLS terms we need to combine, which is a pretty high priority for us (TranslatorSRI/Babel#302). I'll try to have this fixed by Guppy.

@gaurav gaurav added Hammerhead (Sprint 6) - due Oct 4 in CI This ticket will be fixed in CI by the end of Hammerhead (Sprint 6) (Oct 4) and removed Guppy (Sprint 5) - due Aug 23 in CI This ticket will be fixed in CI by the end of Guppy (Sprint 5) (Aug 23) labels Aug 25, 2024
@gaurav
Copy link

gaurav commented Aug 26, 2024

I'm pushing all protein/chemical combination work into Hammerhead. Plus, adding a manual conflation to proteins turns out to be trickier than adding a manual conflation to chemical entities.

@sstemann
Copy link

in Hammerhead release on Test: https://ui.test.transltr.io/results?l=Bethlem%20Myopathy&i=MONDO:0008029&t=0&r=0&q=bc5a5d5e-6b8f-4f73-bf74-a2e1c2af46b7

One result for Botulinum Toxin Type A. Four results for Botulinum.
image

@gaurav are you expecting any other changes for name resolver for this issue or can we consider it closed?

@gaurav gaurav removed the Hammerhead (Sprint 6) - due Oct 4 in CI This ticket will be fixed in CI by the end of Hammerhead (Sprint 6) (Oct 4) label Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chemical conflation Next Phase normalization relates to or otherwise handled by node norm or name resolver - separate from autocomplete SRI Tooling
Projects
None yet
Development

No branches or pull requests

6 participants