Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIC/AIC issues #57

Open
schmidtfa opened this issue Aug 21, 2024 · 2 comments
Open

BIC/AIC issues #57

schmidtfa opened this issue Aug 21, 2024 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@schmidtfa
Copy link
Owner

BIC and AIC seem to often favor knee over fixed models, even if the spectrum looks pretty "linear" in loglog

@schmidtfa
Copy link
Owner Author

So the problem is as follows:
I have implemented akaike (AIC) and bayesian (BIC) information criteria, but they always tend to favor the more complex model (even if i know that its the wrong model via simulation) so I have the feeling sth is going wrong. The formula for information criteria is in general quite simple.

IC = -2ll + An * p

IC = Generalized information criteria
An = Penalty Weight (e.g "log(n)" in BIC and "2" in AIC)
p = Number of parameters (i.e. predictors)
-2ll = 2times the negative log-likelihood, which becomes in linear regression equivalent to nsamples * log(mean squared error)

see also https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7299313/#ref6a

I also looked at some other resources and I am pretty sure this formula is correct (e.g. https://robjhyndman.com/hyndsight/lm_aic.html).

So the weird thing is that when I slightly adjust the way I compute "-2ll" everything works as expected.
Most information criteria I specify always suggest the correct model. I really simulated a lot of different settings (different exponents, different frequency resolutions, different knees etc.) and it basically always works out and tells me correctly whether I am having a knee in my data or not, which is cool but I am also drifting apart from how its usually done. So what I am changing is essentially that instead
of doing:
nsamples * log(mean squared error) [where samples refers to the frequency bins]

I am doing:
log(nsamples) * log(mean squared error) [where samples refers to the frequency bins]

@schmidtfa schmidtfa added help wanted Extra attention is needed and removed bug Something isn't working labels Oct 4, 2024
@schmidtfa
Copy link
Owner Author

I am removing the label of bug as everything works as expected right now (at least based on the simulations i ran and also from visually inspecting real data). However, I want to keep the issue open for now in case a user ends up having a problem with their data and/or someone has an idea why the classical formula fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant