$\beta_m$ Ridge Regularization, and $\beta_0$ initialization value. #133

jgallowa07 · 2024-01-30T20:40:45Z

This issue outlines how and why we should change the model initialization defaults for the latent offset $\beta_0$, and maybe more importantly, the ridge coefficient parameter for regularizing the set of mutation effect parameters $\beta_m$.

Problem

By default, we initialize a latent offset parameter, $\beta_0 = 0$. As seen in our simulation work, this can lead to the models getting stuck in a terrible local minima when fitting as seen below.

Here, we're fitting to a simulation where the true latent wildtype phenotype has a value of $5$. It seems the problem here is that the beta's get out of control (very high values) to attempt to fit a (usually positive) wildtype latent phenotype, thus the model uses the latent offset to try and correct for this behavior and then get's stuck. The table below shows a collection of these same models fit at different initialization values for $beta_0$

We can see that there is some threshold of the $\beta_0$ initial value ("init_beta_naught") where the model begins to correctly fit (avoid that local minima) -- 0.6 in this case.

Proposed solution

Because our sigmoid is centered at 0, we often expect the latent phenotype of the wildtype to be greater than $0$, and thus it seems reasonable to set the initial latent offset value to something greater than $0$ (5?). However, this is a lazy fix, and we would like our model fitting to be more robust to the initial parameter values. As it turns out - a ridge $L_2$ penalty on the set of mutation effect parameters $\beta_m$ also helps.

Here's the same table, but with models fit to include a non-zero ridge penalty (scaling coefficient $=1e-6$), as opposed to the default of effectively no ridge penalty (scaling coefficient $=0$).

Here, we see the model correctly infers the latent offset no matter what we choose as the initial value. With no other adverse effects AFAICT, I propose we set a non-zero coefficient for the ridge by default.

( cc @jbloom @wsdewitt @Haddox )

Haddox · 2024-01-30T21:06:02Z

Thanks for raising this issue @jgallowa07.

As you know from our previous discussions about this, I agree that this is an important problem, though I am reluctant to set a default non-zero ridge weight on the betas. My concern is that there isn't a single value that would be one-size-fits all. We expect many betas to have large negative values. And the ridge penalty needs to be weak enough to not penalize those too strongly. How weak it needs to be could depend a lot on the dataset in question. The default value we choose could be too strong in some cases, and if users don't know the ridge is there, it might impact their results without them knowing it.

What about instead including a section in the documentation about ways to troubleshoot model fitting or improve convergence where we talk about this?

jgallowa07 · 2024-01-31T23:56:52Z

That sounds fine to me. Moving this to a documentation issue and will close once included.

In that spirit, noting that the ridge seems to help stabilize and increase performance of the model's ability a little bit, at least with these simulations:

W/O ridge penalty

W/ ridge penalty

jgallowa07 · 2024-03-22T18:34:22Z

Closing this as #148 encapsulates the action we decided to take here.

jgallowa07 added the documentation Improvements or additions to documentation label Jan 31, 2024

jgallowa07 mentioned this issue Mar 5, 2024

Convergence criteria #137

Closed

jgallowa07 mentioned this issue Mar 22, 2024

Troubleshooting Documentation #148

Open

jgallowa07 closed this as completed Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

$\beta_m$ Ridge Regularization, and $\beta_0$ initialization value. #133

$\beta_m$ Ridge Regularization, and $\beta_0$ initialization value. #133

jgallowa07 commented Jan 30, 2024

Haddox commented Jan 30, 2024

jgallowa07 commented Jan 31, 2024 •

edited

Loading

jgallowa07 commented Mar 22, 2024

$\beta_m$ Ridge Regularization, and $\beta_0$ initialization value. #133

$\beta_m$ Ridge Regularization, and $\beta_0$ initialization value. #133

Comments

jgallowa07 commented Jan 30, 2024

Problem

Proposed solution

Haddox commented Jan 30, 2024

jgallowa07 commented Jan 31, 2024 • edited Loading

jgallowa07 commented Mar 22, 2024

jgallowa07 commented Jan 31, 2024 •

edited

Loading