Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorize study section minor additions/modifications #183

Merged
merged 6 commits into from
Jan 26, 2017

Conversation

brettbj
Copy link
Contributor

@brettbj brettbj commented Jan 4, 2017

I'm planning on tackling the data sharing/ privacy and standardization challenge sections next, I also propose adding the wide data challenge section.

denoising autoencoder architecture applied to the number and co-occurrence of
clinical test events, though not the results of those tests, constructed
features that were more useful for disease prediction than other existing
feature construction methods [@doi:10.1038/srep26094]. While each of these
touched on clinical tests, neither considered full text records. Taken together,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Patient does nlp extraction to create features

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

reduce the error rate to under 1%. In this area, these algorithms may be ready
to incorporate into existing tools to aid pathologists. The authors' work
suggests that this could reduce the false negative rate of such evaluations.
`TODO: Incorporate #71 via @brettbj who has covered in journal club and has
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to keep rough chronological order

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks to be an improvement to me 👍

@agitter
Copy link
Collaborator

agitter commented Jan 4, 2017

Let me know if/when you want me to review anything. Will the wide data section go in Categorize or Discussion? Some of that could span both Study and Categorize.

@agitter agitter mentioned this pull request Jan 8, 2017
@brettbj
Copy link
Contributor Author

brettbj commented Jan 10, 2017

@agitter I completed the other challenges sections - open to your thoughts on where to put the wide data section

@brettbj brettbj requested a review from agitter January 10, 2017 04:07
Copy link
Member

@cgreene cgreene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few elements of feedback. There's some clear areas of complementarity here with #194. I'd like to see the formatting of this get cleaned up before a merge. I'd also like us to get #194 and this merged so that you guys can start to work together to refine these areas.

reduce the error rate to under 1%. In this area, these algorithms may be ready
to incorporate into existing tools to aid pathologists. The authors' work
suggests that this could reduce the false negative rate of such evaluations.
`TODO: Incorporate #71 via @brettbj who has covered in journal club and has
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks to be an improvement to me 👍

* https://github.com/greenelab/deep-review/issues/82
* https://github.com/greenelab/deep-review/issues/152
* https://github.com/greenelab/deep-review/issues/155
(@sw1 + maybe @traversc). These include: * https://github.com/greenelab/deep-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the reflow here broke the formatting in weird ways.

denoising autoencoder architecture applied to the number and co-occurrence of
clinical test events, though not the results of those tests, constructed
features that were more useful for disease prediction than other existing
feature construction methods [@doi:10.1038/srep26094]. While each of these
touched on clinical tests, neither considered full text records. Taken together,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

* Semi-supervised methods to take advantage of large number of unlabeled
examples
* Transfer learning.
working around) * Semi-supervised methods to take advantage of large number of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again - reflow and lists seem to have disagreed.

meaning research is at most a tertiary priority. This presents significant
challenges to EHR based research in general, and particularly to data
intensive deep learning research. EHRs are used differently even within the same
health care system (@pmid:PMC3797550, @pmid:PMC3041534). Individual users have unique usage patterns, and different departments have different priorities which introduce missing data in a non-random fashion. Just et al. demonstrated that even the most
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paren brackets instead of square - think this will break @dhimmel's formatting script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also line length here is irregular might want to reflow this bit.

[@pmid:27134610]. This is before considering challenges caused by system
migrations and health care system expansions through acquisitions. Replication
between hospital systems requires controlling for both these systematic biases as
well as for population and demographic effects. Historically, rules-based algorithms have been popular but because these are developed at a single institution and trained with a specific patient population they do not transfer easily to other populations [@doi:10.1136/amiajnl-2013-001935 ]. Wiley et al. [@doi:10.1142/9789813207813_0050] showed that warfarin dosing algorithms often under perform in African Americans, illustrating that some of these issues are unsolved even at a treatment best practices level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this discusses rules-based approaches. Can you tie it to ML-based methods? Is there a parallel here with regard to something like translation where rules-based methods are now being superseded by deep learning methods?


###### Standardization/integration
###### Temporal Patient Trajectories

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you 80char/line this. Otherwise we can only comment at the full paragraph level. Also looks like you have @doi and then a PMID.


Traditionally, physician training programs justified long training hours by citing increased continuity of care and learning by following the progression of a disease over time, despite the known consequences of decreased mental and quality of life [@doi:pmid:15047076, @pmid:12691951, @pmid:2321788, @pmid:9089922]. Yet, a common practice in EHR-based research is to take a point in time snapshot and convert patient data to a traditional vector for machine learning and statistical analysis. This results in significant signal losses as timing and order of events provide insight into a patient's disease and treatment. Efforts to account for the order of events have shown promise [@doi:10.1038/ncomms5022] but require exceedingly large patient sizes due to discrete combinatorial bucketing.

Lasko et al. [@pmid:23826094] used autoencoders on longitudinal sequences of serum urine acid measurements to identify population subtypes. More recently, deep learning has shown promise working with both sequences (Convolutional Neural Networks) [@arXiv:1607.07519] and the incorporation of past and current state (Recurrent Neural Networks, Long Short Term Memory Networks)[@arXiv:602.00357v1].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where there are DOIs for these PMIDs can you use the DOI so that we can automatically extract reference information? Also, can you create an issue for the ones that are deep learning if they don't already exist.


Lasko et al. [@pmid:23826094] used autoencoders on longitudinal sequences of serum urine acid measurements to identify population subtypes. More recently, deep learning has shown promise working with both sequences (Convolutional Neural Networks) [@arXiv:1607.07519] and the incorporation of past and current state (Recurrent Neural Networks, Long Short Term Memory Networks)[@arXiv:602.00357v1].

###### Data sharing and privacy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This touches a bit on a few topics (available data) that @XieConnect also raised in #194. After merge, it'll be important to check the ordering and make sure that the flow best takes advantage of these two complementary discussions.

@XieConnect nicely raises the challenge that physicians are expensive. @brettbj nicely raises the challenge that some data can't really be shared.


###### Biomedical data is often "Wide"

*Biomedical studies typically deal with relatively small sample sizes but each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this is important. This is definitely along the lines of what @XieConnect is getting at in #194 (limited standards - here you have limited examples - and maybe particularly labeled examples). Suggest again that you guys discuss and integrate this in a subsequent PR.

@brettbj
Copy link
Contributor Author

brettbj commented Jan 19, 2017

Thanks @cgreene, probably won't get to this until the weekend (after chalk talk)

Copy link
Member

@cgreene cgreene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets us to a nice draft on these sections. Going to merge and we will continue to revise in the context of the other sections 👍

@cgreene cgreene merged commit 36d86f8 into master Jan 26, 2017
@cgreene cgreene deleted the categorize_study_section branch January 26, 2017 14:40
@agitter
Copy link
Collaborator

agitter commented Jan 28, 2017

Thanks for reviewing this @cgreene since I never did.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants