GitHub - frdougal/xmeso-TermExtractor

The xmeso-TermExtractor code builds a dictionary of terms for use by Ruta from i2b2's ontology. The code interfaces with the Unified Medical Language System (UMLS) API

https://documentation.uts.nlm.nih.gov/

** Pre-requisites **

-- dbconf.py -- The code contains a dbconf_example.py file. This file is an example. Please copy it and rename it dbconf.py. In dbconf.py, you need to set the following values:

Oracle connection information (username, password, etc.) to connect to your i2b2 metadata schema
UMLS api key, username and password. To apply for a UMLS account, please see this link:

https://documentation.uts.nlm.nih.gov/rest/authentication.html

outputDir the directory where the completed

-- Python pre-requisites -- You must install the following python modules: PyQuery cx_Oracle urllib2 requests

-- Running the code -- Use python to execute the main method in TermExtractor.py The output will contain different messages depending on the codes extracted from i2b2.

"Found new category. Loading terms for category: XXXX" this indicates the code found a new category of i2b2 terms to load.

"Could not find prefix for basecode: XXXX" this message is caused by some i2b2 terms with basecodes not found in UMLS (ex: PATH|IMMUNH:UN)

"Error retrieving synonyms for concept [XXXXX]. Error: concept" this message is caused by i2b2 terms with basecodes not found in UMLS (see above).

"Error retrieving extractSeedConcept: 'NoneType' object has no attribute 'group'" this indicates that the code is attempting to process a top-level node (ex: /SURGICAL MARGINS/). These i2b2 items do not need to be searched.

The code generate several lines when it outputs the data to files (where XXXX is the "category"):

Opening files for output. Opening file: /Users/chb69/borromeocd/nmvb/nlp_work/ruta_XXXX.txt Closing file ... Done.

The code performs the following steps:

Execute a SQL query to retrieve the term list from i2b2's ontology (the metadata schema)
For each item in the i2b2 term list, add it to the Ruta dictionary.
For each item in the i2b2 term list, determine its "category".
For each item in the i2b2 term list, find all synonyms from UMLS. Add the UMLS synonyms to the Ruta dictionary.
After all the items in the i2b2 term list are finished, dump the data into separate semi-colon delimited files for Ruta. Create one file for each "category".

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src/edu		src/edu
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

frdougal/xmeso-TermExtractor

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages