Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$\beta_m$ Ridge Regularization, and $\beta_0$ initialization value. #133

Closed
jgallowa07 opened this issue Jan 30, 2024 · 3 comments
Closed
Labels
documentation Improvements or additions to documentation

Comments

@jgallowa07
Copy link
Member

This issue outlines how and why we should change the model initialization defaults for the latent offset $\beta_0$, and maybe more importantly, the ridge coefficient parameter for regularizing the set of mutation effect parameters $\beta_m$.

Problem

By default, we initialize a latent offset parameter, $\beta_0 = 0$. As seen in our simulation work, this can lead to the models getting stuck in a terrible local minima when fitting as seen below.

Screenshot from 2024-01-30 12-07-18

Here, we're fitting to a simulation where the true latent wildtype phenotype has a value of $5$. It seems the problem here is that the beta's get out of control (very high values) to attempt to fit a (usually positive) wildtype latent phenotype, thus the model uses the latent offset to try and correct for this behavior and then get's stuck. The table below shows a collection of these same models fit at different initialization values for $beta_0$

Screenshot from 2024-01-30 12-23-04

We can see that there is some threshold of the $\beta_0$ initial value ("init_beta_naught") where the model begins to correctly fit (avoid that local minima) -- 0.6 in this case.

Proposed solution

Because our sigmoid is centered at 0, we often expect the latent phenotype of the wildtype to be greater than $0$, and thus it seems reasonable to set the initial latent offset value to something greater than $0$ (5?). However, this is a lazy fix, and we would like our model fitting to be more robust to the initial parameter values. As it turns out - a ridge $L_2$ penalty on the set of mutation effect parameters $\beta_m$ also helps.

Here's the same table, but with models fit to include a non-zero ridge penalty (scaling coefficient $=1e-6$), as opposed to the default of effectively no ridge penalty (scaling coefficient $=0$).

Screenshot from 2024-01-30 12-34-21

Here, we see the model correctly infers the latent offset no matter what we choose as the initial value. With no other adverse effects AFAICT, I propose we set a non-zero coefficient for the ridge by default.

( cc @jbloom @wsdewitt @Haddox )

@Haddox
Copy link
Contributor

Haddox commented Jan 30, 2024

Thanks for raising this issue @jgallowa07.

As you know from our previous discussions about this, I agree that this is an important problem, though I am reluctant to set a default non-zero ridge weight on the betas. My concern is that there isn't a single value that would be one-size-fits all. We expect many betas to have large negative values. And the ridge penalty needs to be weak enough to not penalize those too strongly. How weak it needs to be could depend a lot on the dataset in question. The default value we choose could be too strong in some cases, and if users don't know the ridge is there, it might impact their results without them knowing it.

What about instead including a section in the documentation about ways to troubleshoot model fitting or improve convergence where we talk about this?

@jgallowa07 jgallowa07 added the documentation Improvements or additions to documentation label Jan 31, 2024
@jgallowa07
Copy link
Member Author

jgallowa07 commented Jan 31, 2024

That sounds fine to me. Moving this to a documentation issue and will close once included.

In that spirit, noting that the ridge seems to help stabilize and increase performance of the model's ability a little bit, at least with these simulations:

W/O ridge penalty

Screenshot from 2024-01-31 15-55-27

W/ ridge penalty

Screenshot from 2024-01-31 15-52-36

@jgallowa07
Copy link
Member Author

Closing this as #148 encapsulates the action we decided to take here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants