LengerichLab · blengerich · Aug 16, 2024 · Aug 16, 2024 · Aug 16, 2024 · Aug 16, 2024
diff --git a/content/02.introduction.md b/content/02.introduction.md
@@ -0,0 +1,101 @@
+## Introduction
+Personalization aims to solve the problem of _parameter heterogeneity_, where model parameters are _sample-specific_. 
+$$X_i \sim P(X; \theta_i)$$
+From $N$ observations, personalized modeling methods aim to recover $N$ parameter estimates $\widehat{\theta}_1, ..., \widehat{\theta}_N$.
+Without further assumptions this problem is ill-defined, and the estimators have far too much variance to be useful. 
+We can begin to make this problem tractable by imposing assumptions on the topology of $\theta$, or the relationship between $\theta$ and contextual variables.
+
+### Population Models
+The fundamental assumption of most models is that samples are independent and identically distributed.
+However, if samples are identically distributed they must also have identical parameters.
+To account for parameter heterogeneity and create more realistic models we must relax this assumption, but the assumption is so fundamental to many methods that alternatives are rarely explored.
+Additionally, many traditional models may produce a seemingly acceptable fit to their data, even when the underlying model is heterogeneous.
+Here, we explore the consequences of applying homogeneous modeling approaches to heterogeneous data, and discuss how subtle but meaningful effects are often lost to the strength of the identically distributed assumption.
+
+Failure modes of population models can be identified by their error distributions.
+
+__Mode collapse__:
+If one population is much larger than another, the other population will be underrepresented in the model.
+
+__Outliers__:
+Small populations of outliers can have an enormous effect on OLS models in the parameter-averaging regime.
+
+__Phantom Populations__:
+If several populations are present but equally represented, the optimal traditional model will represent none of these populations.
+
+__Lemma:__ A traditional OLS linear model will be the average of heterogeneous models. 
+
+### Context-informed models
+
+##### Conditional and Cluster Models
+While conditional and cluster models are not truly personalized models, the spirit is the same.
+These models make the assumption that models in a single conditional or cluster group are homogeneous. 
+More commonly this is written as a group of observations being generated by a single model.
+While the assumption results in fewer than $N$ models, it allows the use of generic plug-in estimators.
+Conditional or cluster estimators take the form 
+$$ \widehat{\theta}_0, ..., \widehat{\theta}_C = \arg\max_{\theta_0, ..., \theta_C} \sum_{c \in \mathcal{C}} \ell(X_c; \theta_c) $$
+where $\ell(X; \theta)$ is the log-likelihood of $\theta$ on $X$ and $c$ specifies the covariate group that samples are assigned to, usually by specifying a condition or clustering on covariates thought to affect the distribution of observations. 
+Notably, this method produces fewer than $N$ distinct models for $N$ samples and will fail to recover per-sample parameter variation.
+
+
+##### Distance-regularized Models
+Distance-regularized models assume that models with similar covariates have similar parameters and encode this assumption as a regularization term.
+$$ \widehat{\theta}_0, ..., \widehat{\theta}_N = \arg\max_{\theta_0, ..., \theta_N} \sum_i \left[ \ell(x_i; \theta_i) \right] - \sum_{i, j} \frac{\| \theta_i - \theta_j \|}{D(c_i, c_j)} $$
+The second term is a regularizer that penalizes divergence of $\theta$'s with similar $c$.
+
+
+##### Parametric Varying-coefficient models
+Original paper (based on a smoothing spline function): @doi:10.1111/j.2517-6161.1993.tb01939.x
+Markov networks: @doi:10.1080/01621459.2021.2000866
+Linear varying-coefficient models assume that parameters vary linearly with covariates, a much stronger assumption than the classic varying-coefficient model but making a conceptual leap that allows us to define a form for the relationship between the parameters and covariates. 
+$$\widehat{\theta}_0, ..., \widehat{\theta}_N = \widehat{A} C^T$$
+$$ \widehat{A} = \arg\max_A \sum_i \ell(x_i; A c_i) $$
+
+
+##### Semi-parametric varying-coefficient Models
+Original paper: @doi:10.1214/aos/1017939139
+2-step estimation with RBF kernels: @arxiv:2103.00315
+
+Classic varying-coefficient models assume that models with similar covariates have similar parameters, or -- more formally -- that changes in parameters are smooth over the covariate space.
+This assumption is encoded as a sample weighting, often using a kernel, where the relevance of a sample to a model is equivalent to its kernel similarity over the covariate space.
+$$\widehat{\theta}_0, ..., \widehat{\theta}_N = \arg\max_{\theta_0, ..., \theta_N} \sum_{i, j} \frac{K(c_i, c_j)}{\sum_{k} K(c_i, c_k)} \ell(x_j; \theta_i)$$
+This estimator is the simplest to recover $N$ unique parameter estimates. 
+However, the assumption here is contradictory to the partition model estimator. 
+When the relationship between covariates and parameters is discontinuous or abrupt, this estimator will fail.
+
+##### Contextualized Models
+Seminal work @doi:10.48550/arXiv.1705.10301
+Contextualized ML generalization and applications: @doi:10.48550/arXiv.2310.11340, @doi:10.48550/arXiv.2111.01104, @doi:10.21105/joss.06469, @doi:10.48550/arXiv.2310.07918, @doi:10.1016/j.jbi.2022.104086, @doi:10.1101/2020.06.25.20140053, @doi:10.1101/2023.12.01.569658, @doi:10.48550/arXiv.2312.14254
+
+Contextualized models make the assumption that parameters are some function of context, but make no assumption on the form of that function. 
+In this regime, we seek to estimate the function often using a deep learner (if we have some differentiable proxy for probability):
+$$ \widehat{f} = \arg \max_{f \in \mathcal{F}} \sum_i \ell(x_i; f(c_i)) $$
+
+### Latent-structure Models
+
+##### Partition Models
+Markov networks: @doi:10.1214/09-AOAS308
+Partition models also assume that parameters can be partitioned into homogeneous groups over the covariate space, but make no assumption about where these partitions occur.
+This allows the use of information from different groups in estimating a model for a each covariate. 
+Partition model estimators are most often utilized to infer abrupt model changes over time and take the form
+$$ \widehat{\theta}_0, ..., \widehat{\theta}_N = \arg\max_{\theta_0, ..., \theta_N} \sum_i \ell(x_i; \theta_i) + \sum_{i = 2}^N \text{TV}(\theta_i, \theta_{i-1})$$
+Where the regularizaiton term might take the form 
+$$\text{TV}(\theta_i, \theta_{i - 1}) = |\theta_i - \theta_{i-1}|$$ 
+This still fails to recover a unique parameter estimate for each sample, but gets closer to the spirit of personalized modeling by putting the model likelihood and partition regularizer in competition to find the optimal partitions. 
+
+
+### Fine-tuned Models and Transfer Learning
+Review: @doi:10.48550/arXiv.2206.02058
+Noted in foundational literature for linear varying coefficient models @doi:10.1214/aos/1017939139
+
+Estimate a population model, freeze these parameters, and then include a smaller set of personalized parameters to estimate on a smaller subpopulation.
+$$ \widehat{\gamma} = \arg\max_{\gamma} = \ell(\gamma; X) $$
+$$ \widehat{\theta_c} = \arg\max_{\theta_c} = \ell(\theta_c; \widehat{\gamma}, X_c) $$
+
+
+
+### Context-informed and Latent-structure models
+Seminal paper: @doi:10.48550/arXiv.1910.06939
+
+Key idea: negative information sharing. Different models should be pushed apart.
+$$ \widehat{\theta}_0, ..., \widehat{\theta}_N = \arg\max_{\theta_0, ..., \theta_N, D} \sum_{i=0}^N \prod_{j=0 s.t. D(c_i, c_j) < d}^N P(x_j; \theta_i) P(\theta_i ; \theta_j) $$
diff --git a/content/metadata.yaml b/content/metadata.yaml
@@ -19,12 +19,15 @@ authors:
       - Department of Statistics, University of Wisconsin-Madison
     funders:
         -
-  - github: janeroe
-    name: Jane Roe
-    initials: JR
-    orcid: XXXX-XXXX-XXXX-XXXX
-    email: jane.roe@whatever.edu
+  - github: cnellington
+    name: Caleb N. Ellington
+    initials: CE
+    orcid: 0000-0001-7029-8023
+    twitter: probablybots
+    mastodon:
+    mastodon-server:
+    email: cellingt@cs.cmu.edu
     affiliations:
-      - Department of Something, University of Whatever
-      - Department of Whatever, University of Something
-    corresponding: true
+      - Computational Biology Department, Carnegie Mellon University
+    funders:
+        -