-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Categorize study section minor additions/modifications #183
Conversation
denoising autoencoder architecture applied to the number and co-occurrence of | ||
clinical test events, though not the results of those tests, constructed | ||
features that were more useful for disease prediction than other existing | ||
feature construction methods [@doi:10.1038/srep26094]. While each of these | ||
touched on clinical tests, neither considered full text records. Taken together, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deep Patient does nlp extraction to create features
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
reduce the error rate to under 1%. In this area, these algorithms may be ready | ||
to incorporate into existing tools to aid pathologists. The authors' work | ||
suggests that this could reduce the false negative rate of such evaluations. | ||
`TODO: Incorporate #71 via @brettbj who has covered in journal club and has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to keep rough chronological order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks to be an improvement to me 👍
Let me know if/when you want me to review anything. Will the wide data section go in Categorize or Discussion? Some of that could span both Study and Categorize. |
@agitter I completed the other challenges sections - open to your thoughts on where to put the wide data section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce the error rate to under 1%. In this area, these algorithms may be ready | ||
to incorporate into existing tools to aid pathologists. The authors' work | ||
suggests that this could reduce the false negative rate of such evaluations. | ||
`TODO: Incorporate #71 via @brettbj who has covered in journal club and has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks to be an improvement to me 👍
* https://github.com/greenelab/deep-review/issues/82 | ||
* https://github.com/greenelab/deep-review/issues/152 | ||
* https://github.com/greenelab/deep-review/issues/155 | ||
(@sw1 + maybe @traversc). These include: * https://github.com/greenelab/deep- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the reflow here broke the formatting in weird ways.
denoising autoencoder architecture applied to the number and co-occurrence of | ||
clinical test events, though not the results of those tests, constructed | ||
features that were more useful for disease prediction than other existing | ||
feature construction methods [@doi:10.1038/srep26094]. While each of these | ||
touched on clinical tests, neither considered full text records. Taken together, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* Semi-supervised methods to take advantage of large number of unlabeled | ||
examples | ||
* Transfer learning. | ||
working around) * Semi-supervised methods to take advantage of large number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again - reflow and lists seem to have disagreed.
meaning research is at most a tertiary priority. This presents significant | ||
challenges to EHR based research in general, and particularly to data | ||
intensive deep learning research. EHRs are used differently even within the same | ||
health care system (@pmid:PMC3797550, @pmid:PMC3041534). Individual users have unique usage patterns, and different departments have different priorities which introduce missing data in a non-random fashion. Just et al. demonstrated that even the most |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paren brackets instead of square - think this will break @dhimmel's formatting script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also line length here is irregular might want to reflow this bit.
[@pmid:27134610]. This is before considering challenges caused by system | ||
migrations and health care system expansions through acquisitions. Replication | ||
between hospital systems requires controlling for both these systematic biases as | ||
well as for population and demographic effects. Historically, rules-based algorithms have been popular but because these are developed at a single institution and trained with a specific patient population they do not transfer easily to other populations [@doi:10.1136/amiajnl-2013-001935 ]. Wiley et al. [@doi:10.1142/9789813207813_0050] showed that warfarin dosing algorithms often under perform in African Americans, illustrating that some of these issues are unsolved even at a treatment best practices level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like where this discusses rules-based approaches. Can you tie it to ML-based methods? Is there a parallel here with regard to something like translation where rules-based methods are now being superseded by deep learning methods?
|
||
###### Standardization/integration | ||
###### Temporal Patient Trajectories | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you 80char/line this. Otherwise we can only comment at the full paragraph level. Also looks like you have @doi and then a PMID.
|
||
Traditionally, physician training programs justified long training hours by citing increased continuity of care and learning by following the progression of a disease over time, despite the known consequences of decreased mental and quality of life [@doi:pmid:15047076, @pmid:12691951, @pmid:2321788, @pmid:9089922]. Yet, a common practice in EHR-based research is to take a point in time snapshot and convert patient data to a traditional vector for machine learning and statistical analysis. This results in significant signal losses as timing and order of events provide insight into a patient's disease and treatment. Efforts to account for the order of events have shown promise [@doi:10.1038/ncomms5022] but require exceedingly large patient sizes due to discrete combinatorial bucketing. | ||
|
||
Lasko et al. [@pmid:23826094] used autoencoders on longitudinal sequences of serum urine acid measurements to identify population subtypes. More recently, deep learning has shown promise working with both sequences (Convolutional Neural Networks) [@arXiv:1607.07519] and the incorporation of past and current state (Recurrent Neural Networks, Long Short Term Memory Networks)[@arXiv:602.00357v1]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where there are DOIs for these PMIDs can you use the DOI so that we can automatically extract reference information? Also, can you create an issue for the ones that are deep learning if they don't already exist.
|
||
Lasko et al. [@pmid:23826094] used autoencoders on longitudinal sequences of serum urine acid measurements to identify population subtypes. More recently, deep learning has shown promise working with both sequences (Convolutional Neural Networks) [@arXiv:1607.07519] and the incorporation of past and current state (Recurrent Neural Networks, Long Short Term Memory Networks)[@arXiv:602.00357v1]. | ||
|
||
###### Data sharing and privacy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This touches a bit on a few topics (available data) that @XieConnect also raised in #194. After merge, it'll be important to check the ordering and make sure that the flow best takes advantage of these two complementary discussions.
@XieConnect nicely raises the challenge that physicians are expensive. @brettbj nicely raises the challenge that some data can't really be shared.
|
||
###### Biomedical data is often "Wide" | ||
|
||
*Biomedical studies typically deal with relatively small sample sizes but each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree this is important. This is definitely along the lines of what @XieConnect is getting at in #194 (limited standards - here you have limited examples - and maybe particularly labeled examples). Suggest again that you guys discuss and integrate this in a subsequent PR.
Thanks @cgreene, probably won't get to this until the weekend (after chalk talk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets us to a nice draft on these sections. Going to merge and we will continue to revise in the context of the other sections 👍
Thanks for reviewing this @cgreene since I never did. |
I'm planning on tackling the data sharing/ privacy and standardization challenge sections next, I also propose adding the wide data challenge section.