PheKnowVec is a novel method for deriving, implementing, and validating computational phenotypes. PheKnowVec leverages standardized clinical terminologies and open biomedical ontologies to derive, implement, and validate computational phenotype definitions in a scalable embedded structure.
Please see the Project Wiki for more information!
This repository contains more than just code, it provides a detailed and transparent narrative of our research process. For detailed information on how we use GitHub as a reproducible research platform, click here.
Preliminary results were presented at the 2020 Joint Meeting of the American Medical Informatics Association:
Callahan TJ, Wyrwa J, Trinkley KE, Hunter LE, Kahn MG, Bennett TD. (2020, March). Towards Automating Computational Phenotyping: Exploring the Trade-offs of Different Vocabulary Mapping Strategies. Talk; Informatics Summits of the American Medical Informatics Association, Houston, TX; Podium Abstract
Dependencies This repository is built using Python 3.6.2. To install the libraries used in this repository, run the line of code shown below from the within the project directory.
pip install -r requirements.txt
Data
This code assumes that input data is stored in a GoogleSheet, thus this repository contains code which relies on
Google's DriveAPI and
SheetsAPI. In order to use this functionality you will need to:
- Complete the steps described here
- Save the json file containing your credentials to
./resources/programming/Google_API/
- Rename the credential file to "secret_client_gs.json"
This code assumes that your input Google Sheet will follow a specific format:
Phenotype | Cohort | Criteria | Phenotype_Criteria | Input_Type | Source_Domain | Source_Vocabulary | Source_Code | Source_Label |
---|---|---|---|---|---|---|---|---|
ADHD | Case | Include | Presence of at least 1 relevant code in >1 in-person visits, on separate calendar days | Code | Condition | ICD9CM | '314.0' | Attention deficit disorder of childhood |
ADHD | Case | Include | Presence of >1 prescriptions of ADHD-related medications | String | Drug | None | '%adderall%' | adderall |
SQL Queries
- This project assumes that you will want to use the SQL queries that we have prepared and store as GitHub Gist.
There are two types of queries run:
- Queries to map code sets
- Queries to create patient cohorts