Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with collinearity detection on very simple linear model and dataset #553

Closed
mestinso opened this issue Mar 27, 2024 · 2 comments
Closed

Comments

@mestinso
Copy link

I faced a surprising issue today when using GLM for fitting a basic cubic polynomial. See my script and plot below. For my dataset, if I fit a quadratic polynomial there are no issues, however when I move up to a cubic polynomial, the intercept term is removed (apparently detected to be collinear). The good news is I can set dropcollinear=false as a workaround to resolve the issue, however given such a basic dataset I was very surprised to see this issue in the first place. As another check, I threw this same dataset to R's lm function and it handled it no problem without issue.

using Plots
using DataFrames
using GLM

x=[0.0,16.54252507,36.85132953,58.06647333,85.62460607,123.8759051,174.8138864,238.2577034,312.5741294,385.2595299,451.7838571,523.0189254,575.7680507,621.6300705,677.641035]
y=[2.802571697,2.607979564,2.403339032,2.202060006,2.010878422,1.813030653,1.60853479,1.400739363,1.209780073,1.012102908,0.800961457,0.603279074,0.405503624,0.204338282,0.0]
data = DataFrame(x=x, y=y)

plt = scatter(x,y; label="data")

ols1 = lm(@formula(y ~ x + x^2), data)
plot!(plt, x, predict(ols1); label="quadratic fit")

ols2 = lm(@formula(y ~ x + x^2 + x^3), data)
plot!(plt, x, predict(ols2); label="cubic fit")

ols2fixed = lm(@formula(y ~ x + x^2 + x^3), data; dropcollinear=false)
plot!(plt, x, predict(ols2fixed); label="fixed cubic fit")

image

ols2 variable results:
image

ols2fixed variable resulets:
image

Potentially related issues: #426 and #420
...while this may be related to the previous issues, I feel this warranted a new issue given the severity of the problem with such a simple and standard use case.

@mestinso
Copy link
Author

mestinso commented Mar 27, 2024

Ok, after reviewing #426 further, I think this is indeed a duplicate, sorry about that!

Also, I see that on the dev version, QR is available instead of cholesky. I tried the following and it does seem to resolve the issue with no need to drop the collinear check:

ols2fixedtest = lm(@formula(y ~ x + x^2 + x^3), data; method=:qr)

I think given the severity of the issue, it may make sense to make QR the default or turn off the dropcollinear by default, or something else entirely in order to improve the out of box alignment with other software (in R, python, excel, etc). My line of thinking is that the current defaults are just too conservative and it's apparently too easy to get false positives in the collinearity check...

@andreasnoack
Copy link
Member

Closing as a dup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants