diff --git a/sections/03_categorize.md b/sections/03_categorize.md
index bb6abe90..9e922103 100755
--- a/sections/03_categorize.md
+++ b/sections/03_categorize.md
@@ -142,6 +142,22 @@ This indicates a potential strength of deep methods. It may be possible to
 repurpose features from task to task, improving overall predictions as the field
 tackles new challenges.
 
+Several authors have created reusable feature sets for medical terminologies using
+neural embeddings, as popularized by word2Vec [@tag:Word2Vec]. This approach
+was first used on free text medical notes by De Vine et al.
+[@doi:10.1145/2661829.2661974] with results at or better than traditional methods.
+Y. Choi et al.[@doi:10.1145/2567948.2577348] built embeddings of standardized
+terminologies, such as ICD and NDC, used in widely available administrative
+claims data. By learning terminologies for different entities in the same
+vector space, they can potentially find relationships between different
+domains (e.g. drugs and the diseases they treat). Medical claims data does not
+have the natural document structure of clinical notes, and this issue was
+addressed by E. Choi et al. [@doi:10.1145/2939672.2939823], who built
+embeddings using a multi-layer network architecture which mimics the structure
+of claims data. While promising, difficulties in evaluating the quality of
+these kinds of features and variations in clinical coding practices remain as
+challenges to using them in practice.
+
 Identifying consistent subgroups of individuals and individual health
 trajectories from clinical tests is also an active area of research. Approaches
 inspired by deep learning have been used for both unsupervised feature
@@ -157,9 +173,10 @@ scale analysis of an electronic health records system found that a deep
 denoising autoencoder architecture applied to the number and co-occurrence of
 clinical test events, though not the results of those tests, constructed
 features that were more useful for disease prediction than other existing
-feature construction methods [@doi:10.1038/srep26094].  Taken together, these
-results support the potential of unsupervised feature construction in this
-domain. However, numerous challenges including data integration (patient
+feature construction methods [@doi:10.1038/srep26094].  Razavian et al.
+[@arxiv:1608.00647] used a set of 18 common lab tests to predict disease onset
+using both CNN and LSTM architectures and demonstrated and improvement over baseline
+regression models. However, numerous challenges including data integration (patient
 demographics, family history, laboratory tests, text-based patient records,
 image analysis, genomic data) and better handling of streaming temporal data
 with many features, will need to be overcome before we can fully assess the
@@ -236,7 +253,9 @@ making methodological choices that either reduce the need for labeled examples
 or that use transformations to training data to increase the number of times it
 can be used before overfitting occurs. For example, the unsupervised and
 semi-supervised methods that we've discussed reduce the need for labeled
-examples [@doi:10.1016/j.jbi.2016.10.007]. The adversarial training example
+examples [@doi:10.1016/j.jbi.2016.10.007]. The anchor and learn framework
+[@doi:10.1093/jamia/ocw011] uses expert knowledge to identify high confidence
+observations from which labels can be inferred. The adversarial training example
 strategies that we've mentioned can reduce overfitting, if transformations are
 available that preserve the meaningful content of the data while transforming
 irrelevant features [@doi:10.1101/095786]. While adversarial training examples
diff --git a/sections/05_treat.md b/sections/05_treat.md
index b2a40c29..66f09a8d 100644
--- a/sections/05_treat.md
+++ b/sections/05_treat.md
@@ -34,6 +34,16 @@ interpretability of deep learning models, fitting deep models to limited and
 heterogeneous data, and integrating complex predictive models into a dynamic
 clinical environment.
 
+A critical challenge in moving from prediction to treatment recommendations
+is the necessity to establish a causal relationship for a recommendation.
+Causal inference is often framed in terms of counterfactual question
+[@doi:10.1037/h0037350]. Johansson et al [@arxiv:1605.03661] use deep neural
+networks to create representation models for covariates that capture nonlinear
+effects and show significant performance improvements over existing models. In
+a less formal approach, Kale et al [@pmid:26958203] first create a deep neural
+network to model clinical time series and then analyze the relationship of the
+hidden features to the output using a causal approach.
+
 #### Applications
 
 ##### Trajectory Prediction for Treatment