Categorize study section minor additions/modifications #183

brettbj · 2017-01-04T01:32:49Z

I'm planning on tackling the data sharing/ privacy and standardization challenge sections next, I also propose adding the wide data challenge section.

brettbj · 2017-01-04T01:33:28Z

sections/03_categorize.md

 denoising autoencoder architecture applied to the number and co-occurrence of
 clinical test events, though not the results of those tests, constructed
 features that were more useful for disease prediction than other existing
-feature construction methods [@doi:10.1038/srep26094]. While each of these
-touched on clinical tests, neither considered full text records. Taken together,


Deep Patient does nlp extraction to create features

brettbj · 2017-01-04T01:34:19Z

sections/03_categorize.md

-reduce the error rate to under 1%. In this area, these algorithms may be ready
-to incorporate into existing tools to aid pathologists. The authors' work
-suggests that this could reduce the false negative rate of such evaluations.
-`TODO: Incorporate #71 via @brettbj who has covered in journal club and has


Tried to keep rough chronological order

this looks to be an improvement to me 👍

agitter · 2017-01-04T12:57:11Z

Let me know if/when you want me to review anything. Will the wide data section go in Categorize or Discussion? Some of that could span both Study and Categorize.

brettbj · 2017-01-10T04:06:01Z

@agitter I completed the other challenges sections - open to your thoughts on where to put the wide data section

cgreene

I have a few elements of feedback. There's some clear areas of complementarity here with #194. I'd like to see the formatting of this get cleaned up before a merge. I'd also like us to get #194 and this merged so that you guys can start to work together to refine these areas.

cgreene · 2017-01-18T20:45:20Z

sections/03_categorize.md

-reduce the error rate to under 1%. In this area, these algorithms may be ready
-to incorporate into existing tools to aid pathologists. The authors' work
-suggests that this could reduce the false negative rate of such evaluations.
-`TODO: Incorporate #71 via @brettbj who has covered in journal club and has


this looks to be an improvement to me 👍

cgreene · 2017-01-18T20:46:25Z

sections/03_categorize.md

-* https://github.com/greenelab/deep-review/issues/82
-* https://github.com/greenelab/deep-review/issues/152
-* https://github.com/greenelab/deep-review/issues/155
+(@sw1 + maybe @traversc). These include: * https://github.com/greenelab/deep-


Looks like the reflow here broke the formatting in weird ways.

cgreene · 2017-01-18T20:47:00Z

sections/03_categorize.md

 denoising autoencoder architecture applied to the number and co-occurrence of
 clinical test events, though not the results of those tests, constructed
 features that were more useful for disease prediction than other existing
-feature construction methods [@doi:10.1038/srep26094]. While each of these
-touched on clinical tests, neither considered full text records. Taken together,


cgreene · 2017-01-18T20:47:13Z

sections/03_categorize.md

-* Semi-supervised methods to take advantage of large number of unlabeled
-    examples
-* Transfer learning.
+working around) * Semi-supervised methods to take advantage of large number of


Again - reflow and lists seem to have disagreed.

cgreene · 2017-01-18T20:48:10Z

sections/03_categorize.md

+meaning research is at most a tertiary priority. This presents significant
+challenges to EHR based research in general, and particularly to data
+intensive deep learning research. EHRs are used differently even within the same
+health care system (@pmid:PMC3797550, @pmid:PMC3041534). Individual users have unique usage patterns, and different departments have different priorities which introduce missing data in a non-random fashion. Just et al. demonstrated that even the most


paren brackets instead of square - think this will break @dhimmel's formatting script.

also line length here is irregular might want to reflow this bit.

cgreene · 2017-01-18T20:49:26Z

sections/03_categorize.md

+[@pmid:27134610]. This is before considering challenges caused by system
+migrations and health care system expansions through acquisitions. Replication
+between hospital systems requires controlling for both these systematic biases as
+well as for population and demographic effects.  Historically, rules-based algorithms have been popular but because these are developed at a single institution and trained with a specific patient population they do not transfer easily to other populations [@doi:10.1136/amiajnl-2013-001935 ]. Wiley et al. [@doi:10.1142/9789813207813_0050] showed that warfarin dosing algorithms often under perform in African Americans, illustrating that some of these issues are unsolved even at a treatment best practices level. 


I like where this discusses rules-based approaches. Can you tie it to ML-based methods? Is there a parallel here with regard to something like translation where rules-based methods are now being superseded by deep learning methods?

cgreene · 2017-01-18T20:49:59Z

sections/03_categorize.md


-###### Standardization/integration
+###### Temporal Patient Trajectories
+


Can you 80char/line this. Otherwise we can only comment at the full paragraph level. Also looks like you have @doi and then a PMID.

cgreene · 2017-01-18T20:50:46Z

sections/03_categorize.md

+
+Traditionally, physician training programs justified long training hours by citing increased continuity of care and learning by following the progression of a disease over time, despite the known consequences of decreased mental and quality of life [@doi:pmid:15047076, @pmid:12691951, @pmid:2321788, @pmid:9089922]. Yet, a common practice in EHR-based research is to take a point in time snapshot and convert patient data to a traditional vector for machine learning and statistical analysis. This results in significant signal losses as timing and order of events provide insight into a patient's disease and treatment. Efforts to account for the order of events have shown promise [@doi:10.1038/ncomms5022] but require exceedingly large patient sizes due to discrete combinatorial bucketing. 
+
+Lasko et al. [@pmid:23826094] used autoencoders on longitudinal sequences of serum urine acid measurements to identify population subtypes. More recently, deep learning has shown promise working with both sequences (Convolutional Neural Networks) [@arXiv:1607.07519] and the incorporation of past and current state (Recurrent Neural Networks, Long Short Term Memory Networks)[@arXiv:602.00357v1]. 


Where there are DOIs for these PMIDs can you use the DOI so that we can automatically extract reference information? Also, can you create an issue for the ones that are deep learning if they don't already exist.

cgreene · 2017-01-18T20:52:51Z

sections/03_categorize.md

+
+Lasko et al. [@pmid:23826094] used autoencoders on longitudinal sequences of serum urine acid measurements to identify population subtypes. More recently, deep learning has shown promise working with both sequences (Convolutional Neural Networks) [@arXiv:1607.07519] and the incorporation of past and current state (Recurrent Neural Networks, Long Short Term Memory Networks)[@arXiv:602.00357v1]. 
+
+###### Data sharing and privacy


This touches a bit on a few topics (available data) that @XieConnect also raised in #194. After merge, it'll be important to check the ordering and make sure that the flow best takes advantage of these two complementary discussions.

@XieConnect nicely raises the challenge that physicians are expensive. @brettbj nicely raises the challenge that some data can't really be shared.

cgreene · 2017-01-18T20:53:50Z

sections/03_categorize.md

+
+###### Biomedical data is often "Wide"
+
+*Biomedical studies typically deal with relatively small sample sizes but each


Agree this is important. This is definitely along the lines of what @XieConnect is getting at in #194 (limited standards - here you have limited examples - and maybe particularly labeled examples). Suggest again that you guys discuss and integrate this in a subsequent PR.

brettbj · 2017-01-19T03:11:06Z

Thanks @cgreene, probably won't get to this until the weekend (after chalk talk)

cgreene

This gets us to a nice draft on these sections. Going to merge and we will continue to revise in the context of the other sections 👍

agitter · 2017-01-28T12:52:36Z

Thanks for reviewing this @cgreene since I never did.

brettbj added 2 commits January 3, 2017 15:29

categorize

46254f2

77+78 added

8dafa29

brettbj commented Jan 4, 2017

View reviewed changes

agitter mentioned this pull request Jan 8, 2017

Current Section Status #188

Closed

standardization, temporal trajectories, and datasharing challenges

af523f7

brettbj added the categorize label Jan 10, 2017

brettbj requested a review from agitter January 10, 2017 04:07

cgreene requested changes Jan 18, 2017

View reviewed changes

brettbj and others added 3 commits January 25, 2017 17:39

formatting corrections

13ff7d3

addressing comments'

6444009

quick fix of some reflow issues that i noticed

69160de

cgreene approved these changes Jan 26, 2017

View reviewed changes

cgreene merged commit 36d86f8 into master Jan 26, 2017

cgreene deleted the categorize_study_section branch January 26, 2017 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Categorize study section minor additions/modifications #183

Categorize study section minor additions/modifications #183

brettbj commented Jan 4, 2017

brettbj Jan 4, 2017

cgreene Jan 18, 2017

brettbj Jan 4, 2017

cgreene Jan 18, 2017

agitter commented Jan 4, 2017

brettbj commented Jan 10, 2017

cgreene left a comment

cgreene Jan 18, 2017

cgreene Jan 18, 2017

cgreene Jan 18, 2017

cgreene Jan 18, 2017

cgreene Jan 18, 2017

cgreene Jan 18, 2017

cgreene Jan 18, 2017

cgreene Jan 18, 2017

cgreene Jan 18, 2017

cgreene Jan 18, 2017

cgreene Jan 18, 2017

brettbj commented Jan 19, 2017

cgreene left a comment

agitter commented Jan 28, 2017


		###### Standardization/integration
		###### Temporal Patient Trajectories


		Traditionally, physician training programs justified long training hours by citing increased continuity of care and learning by following the progression of a disease over time, despite the known consequences of decreased mental and quality of life [@doi:pmid:15047076, @pmid:12691951, @pmid:2321788, @pmid:9089922]. Yet, a common practice in EHR-based research is to take a point in time snapshot and convert patient data to a traditional vector for machine learning and statistical analysis. This results in significant signal losses as timing and order of events provide insight into a patient's disease and treatment. Efforts to account for the order of events have shown promise [@doi:10.1038/ncomms5022] but require exceedingly large patient sizes due to discrete combinatorial bucketing.

		Lasko et al. [@pmid:23826094] used autoencoders on longitudinal sequences of serum urine acid measurements to identify population subtypes. More recently, deep learning has shown promise working with both sequences (Convolutional Neural Networks) [@arXiv:1607.07519] and the incorporation of past and current state (Recurrent Neural Networks, Long Short Term Memory Networks)[@arXiv:602.00357v1].


		Lasko et al. [@pmid:23826094] used autoencoders on longitudinal sequences of serum urine acid measurements to identify population subtypes. More recently, deep learning has shown promise working with both sequences (Convolutional Neural Networks) [@arXiv:1607.07519] and the incorporation of past and current state (Recurrent Neural Networks, Long Short Term Memory Networks)[@arXiv:602.00357v1].

		###### Data sharing and privacy


		###### Biomedical data is often "Wide"

		*Biomedical studies typically deal with relatively small sample sizes but each

Categorize study section minor additions/modifications #183

Categorize study section minor additions/modifications #183

Conversation

brettbj commented Jan 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agitter commented Jan 4, 2017

brettbj commented Jan 10, 2017

cgreene left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brettbj commented Jan 19, 2017

cgreene left a comment

Choose a reason for hiding this comment

agitter commented Jan 28, 2017