-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added sections on embeddings for medical ontologies and causal inference #339
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
6fe8dbf
Add reference to Multi-task Prediction of Disease Onsets from Longitu…
DaveDeCaprio bedf09a
Added in the anchor and learn framework. This isn't strictly deep le…
DaveDeCaprio e294d68
Added in references to neural embeddings in medical coding
DaveDeCaprio b2220ea
Added causal inference references
DaveDeCaprio 5c317ad
Addressing travis CI build issues
DaveDeCaprio 4833e0d
Fixed malformed references, line length and spacing issues
DaveDeCaprio 1df3bbb
Fixed spacing issues
DaveDeCaprio 2c80b24
Fixed spacing issues
DaveDeCaprio c48d253
Changed pmid to doi
DaveDeCaprio 8585251
Changed PMC id to regular PubMed id.
DaveDeCaprio 962baa9
Add et al
agitter File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -142,6 +142,22 @@ This indicates a potential strength of deep methods. It may be possible to | |
repurpose features from task to task, improving overall predictions as the field | ||
tackles new challenges. | ||
|
||
Several authors have created reusable feature sets for medical terminologies using | ||
neural embeddings, as popularized by word2Vec [@tag:Word2Vec]. This approach | ||
was first used on free text medical notes by De Vine et al. | ||
[@doi:10.1145/2661829.2661974] with results at or better than traditional methods. | ||
Y. Choi et al.[@doi:10.1145/2567948.2577348] built embeddings of standardized | ||
terminologies, such as ICD and NDC, used in widely available administrative | ||
claims data. By learning terminologies for different entities in the same | ||
vector space, they can potentially find relationships between different | ||
domains (e.g. drugs and the diseases they treat). Medical claims data does not | ||
have the natural document structure of clinical notes, and this issue was | ||
addressed by E. Choi et al. [@doi:10.1145/2939672.2939823], who built | ||
embeddings using a multi-layer network architecture which mimics the structure | ||
of claims data. While promising, difficulties in evaluating the quality of | ||
these kinds of features and variations in clinical coding practices remain as | ||
challenges to using them in practice. | ||
|
||
Identifying consistent subgroups of individuals and individual health | ||
trajectories from clinical tests is also an active area of research. Approaches | ||
inspired by deep learning have been used for both unsupervised feature | ||
|
@@ -157,9 +173,10 @@ scale analysis of an electronic health records system found that a deep | |
denoising autoencoder architecture applied to the number and co-occurrence of | ||
clinical test events, though not the results of those tests, constructed | ||
features that were more useful for disease prediction than other existing | ||
feature construction methods [@doi:10.1038/srep26094]. Taken together, these | ||
results support the potential of unsupervised feature construction in this | ||
domain. However, numerous challenges including data integration (patient | ||
feature construction methods [@doi:10.1038/srep26094]. Razavian et al. | ||
[@arxiv:1608.00647] used a set of 18 common lab tests to predict disease onset | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good addition + touches on different architectures. |
||
using both CNN and LSTM architectures and demonstrated and improvement over baseline | ||
regression models. However, numerous challenges including data integration (patient | ||
demographics, family history, laboratory tests, text-based patient records, | ||
image analysis, genomic data) and better handling of streaming temporal data | ||
with many features, will need to be overcome before we can fully assess the | ||
|
@@ -236,7 +253,9 @@ making methodological choices that either reduce the need for labeled examples | |
or that use transformations to training data to increase the number of times it | ||
can be used before overfitting occurs. For example, the unsupervised and | ||
semi-supervised methods that we've discussed reduce the need for labeled | ||
examples [@doi:10.1016/j.jbi.2016.10.007]. The adversarial training example | ||
examples [@doi:10.1016/j.jbi.2016.10.007]. The anchor and learn framework | ||
[@doi:10.1093/jamia/ocw011] uses expert knowledge to identify high confidence | ||
observations from which labels can be inferred. The adversarial training example | ||
strategies that we've mentioned can reduce overfitting, if transformations are | ||
available that preserve the meaningful content of the data while transforming | ||
irrelevant features [@doi:10.1101/095786]. While adversarial training examples | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NDC -> national drug codes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just want to make sure for when we have to look for the first occurrence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct for NDC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - when we go through and check acronyms + define at first occurrence I'll know now. Thanks! I think the DOI below is my only requested change.