Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name, SemanticType lookup for retired CUIs #5

Closed
erikyao opened this issue Dec 13, 2022 · 2 comments
Closed

Name, SemanticType lookup for retired CUIs #5

erikyao opened this issue Dec 13, 2022 · 2 comments
Assignees

Comments

@erikyao
Copy link
Collaborator

erikyao commented Dec 13, 2022

Problem

When replacing a retired UMLS ID, its name, semantic type abbreviation/name should be replaced at the same time, but the retired CUI table, as in the MRCUI.RFF file of UMLS Metathesaurus, contains only UMLS IDs.

E.g. in the source file of SemMedDB predications, the following record

UMLSID Name SemanticTypeAbv SemanticTypeName
C0021311 Infection dsyn Disease or Syndrome

according to MRCUI.RFF, should be replaced by

UMLSID Name SemanticTypeAbv SemanticTypeName
C0009450 ??? ??? ???

But MRCUI.RFF only tells you C0021311 => C0009450 replacement. The new "Name", "SemanticTypeAbv", and "SemanticTypeName" should be filled from other data sources.

P.S. the fully replaced record should be like:

UMLSID Name SemanticTypeAbv SemanticTypeName
C0009450 Communicable Diseases dsyn Communicable Diseases

Solution

Step 1: UMLS ID => Subject/Object Name

Should be queryable in MRCONSO.RRF, file of Concept Names and Sources.

However each UMLS ID might have multiple records. Inspired by Example 7 of UMLS Database Query Diagrams,

# 7. Find all relationships for a concept and the preferred (English) name of the CUI2.

SELECT a.cui1, a.cui2, b.str FROM mrrel a, mrconso b
WHERE a.cui1 = 'C0032344'
     AND a.stype1 = 'CUI'
     AND a.cui2 = b.cui
     AND b.ts = 'P'
     AND b.stt = 'PF'
     AND b.ispref = 'Y'
     AND b.lat = 'ENG';

the filtering condition is

    TS == 'P'  # Term Status being "Preferred LUI of the CUI"
and STT == 'PF'  # String Type being "Preferred form of term"
and ISPREF == 'Y'  # Atom status being "preferred" (Y) for this string within this concept
and LAT == 'ENG' # Language of Terms being "English"

The explanation of other TS, STT, and LAT values can be found at Abbreviations Used in Data Elements - 2022AB Release. The meaning of ISPREF is explained at Table 1, UMLS® Reference Manual.

CUI names are recorded in the STR column.

Step 2: UMLS ID => Semantic Type Name

Query MRSTY.RRF, file of Semantic Types.

Step 3: Semantic Type Name => Semantic Type Abbreviation

Query Semantic Type Mappings.

@erikyao
Copy link
Collaborator Author

erikyao commented Dec 13, 2022

Dec 13 decision with Chunlei:

Make an intermediate UMLS file containing (UMLSID, EntityName, SemanticTypeAbv, SemanticTypeFullName). May server as the source file for future UMLS endpoint.

@erikyao erikyao self-assigned this Dec 13, 2022
@erikyao
Copy link
Collaborator Author

erikyao commented Dec 14, 2022

Dec 14 decision with Colleen:

If a mapped new UMLS ID already appears in the SemMedDB predication CSV file, use its EntityName, SemanticTypeAbv, SemanticTypeFullName before checking RRFs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant