From 16a275e2f2f8f9f2259062ec45dc7c001c2d8103 Mon Sep 17 00:00:00 2001 From: cnellington Date: Fri, 16 Aug 2024 10:18:22 -0400 Subject: [PATCH 1/7] add intro stub --- content/02.introduction.md | 142 +++++++++++++++++++++++++++++++++++++ 1 file changed, 142 insertions(+) create mode 100644 content/02.introduction.md diff --git a/content/02.introduction.md b/content/02.introduction.md new file mode 100644 index 0000000..2ba82e2 --- /dev/null +++ b/content/02.introduction.md @@ -0,0 +1,142 @@ +## Introduction +Personalization aims to solve the problem of __parameter heterogeneity__, where model parameters are __sample-specific__. +$$X_i \sim P(X; \theta_i)$$ +From $N$ observations, personalized modeling methods aim to recover $N$ parameter estimates $\widehat{\theta}_1, ..., \widehat{\theta}_N$. +Without further assumptions this problem is ill-defined, and the estimators have far too much variance to be useful. +We can begin to make this problem tractable by imposing assumptions on the topology of $\theta$, or the relationship between $\theta$ and exogenous (often causal) variables. + + + + + \ No newline at end of file From bdf5ad0ce4a6b6f2edd1c19c08aaabc6e82ac592 Mon Sep 17 00:00:00 2001 From: cnellington Date: Fri, 16 Aug 2024 10:19:15 -0400 Subject: [PATCH 2/7] test gh actions --- content/02.introduction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/02.introduction.md b/content/02.introduction.md index 2ba82e2..44b8f60 100644 --- a/content/02.introduction.md +++ b/content/02.introduction.md @@ -3,7 +3,7 @@ Personalization aims to solve the problem of __parameter heterogeneity__, where $$X_i \sim P(X; \theta_i)$$ From $N$ observations, personalized modeling methods aim to recover $N$ parameter estimates $\widehat{\theta}_1, ..., \widehat{\theta}_N$. Without further assumptions this problem is ill-defined, and the estimators have far too much variance to be useful. -We can begin to make this problem tractable by imposing assumptions on the topology of $\theta$, or the relationship between $\theta$ and exogenous (often causal) variables. +We can begin to make this problem tractable by imposing assumptions on the topology of $\theta$, or the relationship between $\theta$ and contextual variables. From cb259d3701ce8959586d2927d2ce8f0affdc0da5 Mon Sep 17 00:00:00 2001 From: cnellington Date: Fri, 16 Aug 2024 10:28:09 -0400 Subject: [PATCH 3/7] partial md formatting, test gh actions again --- content/02.introduction.md | 40 +++++++++++++++++--------------------- 1 file changed, 18 insertions(+), 22 deletions(-) diff --git a/content/02.introduction.md b/content/02.introduction.md index 44b8f60..b3f3788 100644 --- a/content/02.introduction.md +++ b/content/02.introduction.md @@ -1,38 +1,34 @@ ## Introduction -Personalization aims to solve the problem of __parameter heterogeneity__, where model parameters are __sample-specific__. +Personalization aims to solve the problem of _parameter heterogeneity_, where model parameters are _sample-specific_. $$X_i \sim P(X; \theta_i)$$ From $N$ observations, personalized modeling methods aim to recover $N$ parameter estimates $\widehat{\theta}_1, ..., \widehat{\theta}_N$. Without further assumptions this problem is ill-defined, and the estimators have far too much variance to be useful. We can begin to make this problem tractable by imposing assumptions on the topology of $\theta$, or the relationship between $\theta$ and contextual variables. - - - - \ No newline at end of file +\cite{kim_tree-guided_2012} show that smoothing proximal gradient method can be an efficient solver for the tree lasso model. --> --> \ No newline at end of file From c69d958fe02ee84a163e33d83f4fe97c1e0b9430 Mon Sep 17 00:00:00 2001 From: cnellington Date: Fri, 16 Aug 2024 10:32:49 -0400 Subject: [PATCH 4/7] partial md formatting, test gh actions again --- content/02.introduction.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/content/02.introduction.md b/content/02.introduction.md index b3f3788..de500ce 100644 --- a/content/02.introduction.md +++ b/content/02.introduction.md @@ -12,8 +12,7 @@ To account for parameter heterogeneity and create more realistic models we must Additionally, many traditional models may produce a seemingly acceptable fit to their data, even when the underlying model is heterogeneous. Here, we explore the consequences of applying homogeneous modeling approaches to heterogeneous data, and discuss how subtle but meaningful effects are often lost to the strength of the identically distributed assumption. -#### Failure Modes: -Failure modes can be identified by their error distributions. +Failure modes of population models can be identified by their error distributions. __Mode collapse__: If one population is much larger than another, the other population will be underrepresented in the model. @@ -28,7 +27,7 @@ __Lemma:__ A traditional OLS linear model will be the average of heterogeneous m ### Context-informed models -### Conditional and Cluster Models +#### Conditional and Cluster Models While conditional and cluster models are not truly personalized models, the spirit is the same. These models make the assumption that models in a single conditional or cluster group are homogeneous. More commonly this is written as a group of observations being generated by a single model. @@ -39,13 +38,13 @@ where $\ell(X; \theta)$ is the log-likelihood of $\theta$ on $X$ and $c$ specifi Notably, this method produces fewer than $N$ distinct models for $N$ samples and will fail to recover per-sample parameter variation. -#### Distance-regularized Models +##### Distance-regularized Models Distance-regularized models assume that models with similar covariates have similar parameters and encode this assumption as a regularization term. $$ \widehat{\theta}_0, ..., \widehat{\theta}_N = \arg\max_{\theta_0, ..., \theta_N} \sum_i \left[ \ell(x_i; \theta_i) \right] - \sum_{i, j} \frac{\| \theta_i - \theta_j \|}{D(c_i, c_j)} $$ The second term is a regularizer that penalizes divergence of $\theta$'s with similar $c$. -#### Parametric Varying-coefficient models +##### Parametric Varying-coefficient models Original paper (based on a smoothing spline function): @doi:10.1111/j.2517-6161.1993.tb01939.x Markov networks: @doi:10.1080/01621459.2021.2000866 Linear varying-coefficient models assume that parameters vary linearly with covariates, a much stronger assumption than the classic varying-coefficient model but making a conceptual leap that allows us to define a form for the relationship between the parameters and covariates. From 943f4a11fba4ea22252e73d47c3e640e02a9f93c Mon Sep 17 00:00:00 2001 From: cnellington Date: Fri, 16 Aug 2024 10:47:52 -0400 Subject: [PATCH 5/7] first draft of introduction, all formatted --- content/02.introduction.md | 70 +++++++++----------------------------- 1 file changed, 17 insertions(+), 53 deletions(-) diff --git a/content/02.introduction.md b/content/02.introduction.md index de500ce..c3a7c82 100644 --- a/content/02.introduction.md +++ b/content/02.introduction.md @@ -27,7 +27,7 @@ __Lemma:__ A traditional OLS linear model will be the average of heterogeneous m ### Context-informed models -#### Conditional and Cluster Models +##### Conditional and Cluster Models While conditional and cluster models are not truly personalized models, the spirit is the same. These models make the assumption that models in a single conditional or cluster group are homogeneous. More commonly this is written as a group of observations being generated by a single model. @@ -52,11 +52,9 @@ $$\widehat{\theta}_0, ..., \widehat{\theta}_N = \widehat{A} C^T$$ $$ \widehat{A} = \arg\max_A \sum_i \ell(x_i; A c_i) $$ - --> \ No newline at end of file +Key idea: negative information sharing. Different models should be pushed apart. +$$ \widehat{\theta}_0, ..., \widehat{\theta}_N = \arg\max_{\theta_0, ..., \theta_N, D} \sum_{i=0}^N \prod_{j=0 s.t. D(c_i, c_j) < d}^N P(x_j; \theta_i) P(\theta_i ; \theta_j) $$ From 44dee2a664d98d2a1c62de7a04fe8a5a8f6fdc49 Mon Sep 17 00:00:00 2001 From: cnellington Date: Fri, 16 Aug 2024 10:54:06 -0400 Subject: [PATCH 6/7] add transfer learning header --- content/02.introduction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/02.introduction.md b/content/02.introduction.md index c3a7c82..9df3e15 100644 --- a/content/02.introduction.md +++ b/content/02.introduction.md @@ -84,7 +84,7 @@ $$\text{TV}(\theta_i, \theta_{i - 1}) = |\theta_i - \theta_{i-1}|$$ This still fails to recover a unique parameter estimate for each sample, but gets closer to the spirit of personalized modeling by putting the model likelihood and partition regularizer in competition to find the optimal partitions. -### Fine-tuned Models +### Fine-tuned Models and Transfer Learning Review: @doi:10.48550/arXiv.2206.02058 Noted in foundational literature for linear varying coefficient models @doi:10.1214/aos/1017939139 From 32656fc5384ab2f2f8d7e265a07b3b724b3a9672 Mon Sep 17 00:00:00 2001 From: cnellington Date: Fri, 16 Aug 2024 11:02:32 -0400 Subject: [PATCH 7/7] add author --- content/metadata.yaml | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/content/metadata.yaml b/content/metadata.yaml index 3abd386..d88d461 100644 --- a/content/metadata.yaml +++ b/content/metadata.yaml @@ -19,12 +19,15 @@ authors: - Department of Statistics, University of Wisconsin-Madison funders: - - - github: janeroe - name: Jane Roe - initials: JR - orcid: XXXX-XXXX-XXXX-XXXX - email: jane.roe@whatever.edu + - github: cnellington + name: Caleb N. Ellington + initials: CE + orcid: 0000-0001-7029-8023 + twitter: probablybots + mastodon: + mastodon-server: + email: cellingt@cs.cmu.edu affiliations: - - Department of Something, University of Whatever - - Department of Whatever, University of Something - corresponding: true + - Computational Biology Department, Carnegie Mellon University + funders: + - \ No newline at end of file