Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of dropcollinear feature in GeneralizedLinearModel #488

Merged
merged 88 commits into from
Oct 21, 2022

Conversation

mousum-github
Copy link
Collaborator

@mousum-github mousum-github commented Jul 15, 2022

This pull request supersedes #340

We are looking for the dropcollinear feature in GLM and found #340. The PR, which opened in Oct'2019, looks very close to complete.

Apart from the changes covered in PR 340, we have added

  • A test case to compare LM with equivalent GLM
  • Two test cases to match output with R
  • Another implementation of dof for CholeskyPivoted since the dof will be calculated based on rank, instead of the number of coefficients.
  • update API documentation: Added dropcollinear in the list of keyword arguments.

We have performed logistic regression with 10 independent variables, 10,000 rows and dropcollinear feature true and false; the dataset does not have a multicollinearity issue. The machine configuration is as follows: -
OS – Linux,
CPU - AMD Ryzen 5 3600 6-Core Processor
RAM – 64 GB

With dropcollinear = true: average time 4.381 ms
With dropcollinear = false: average time 4.396 ms

DilumAluthge and others added 30 commits March 25, 2022 23:26
Adding PowerLink link function in GLM
Test was failing in Julia Nightly as: 

1) GLM.linkinv(InverseLink(), 10) was 0.01 while GLM.linkinv(PowerLink(-1), 10) was 0.010000000000000002
2) GLM.linkinv(InverseSquareLink(), 10) was -10.01 while GLM.linkinv(PowerLink(-2), 10) was -0.010000000000000002

Rounding off to 2 digits should solve this. 

Note: These tests were passing in other versions of Julia.
Correcting order of PowerLink, ProbitLink, SqrtLink.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Using ≈ instead of isapprox.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Using alphabetical order in references.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Writing the example description in plain text.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Removing extra space to be consistent with the style.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Using a better variable name for 1 by lambda.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Using inline function.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Making the doctest code concise.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Removing the R code.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Removing test of hashes.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
…few more metrics to test, also replaced all `=` by `≈` for real numbers
Since we are storing the link - the `GLM.Link` function can be defined uniformly for all link functions

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@mousum-github
Copy link
Collaborator Author

Thanks @ayushpatnaikgit. Not sure why GitHub doesn't support that, but thanks to the access to the xKDR repo I could push my merge commit. Tests still appear to fail though.

Usually we only give commit access to people who have made several PRs over a relatively long period, but given that xKDR is invested in improving the stats ecosystem I guess we can accelerate that process. I've just sent an invitation to @mousum-github. Help reviewing other PRs would of course be welcome!

Thanks @nalimilan

@mousum-github
Copy link
Collaborator Author

Hi @nalimilan,
We have updated the code and documentation.

We have updated the newly added nulldeviance and nullloglikelihood functions in glm. Passed the dropcollinear argument value to fit function for null models.

We also added a test case to check the PosDefException exception when dropcollinear is false for a rank-deficient model. Unfortunately, the test case was not passed in a few systems. So, we have commented the test case for timing.

@nalimilan
Copy link
Member

We also added a test case to check the PosDefException exception when dropcollinear is false for a rank-deficient model. Unfortunately, the test case was not passed in a few systems. So, we have commented the test case for timing.

Maybe another example would pass tests more reliably? We really need to check that path. How about trying with categorical variables, like in the "rankdeficient" testset for linear models?

@mousum-github
Copy link
Collaborator Author

mousum-github commented Oct 17, 2022

Okay. Let me check.

@mousum-github
Copy link
Collaborator Author

Okay. Let me check.

Added two test cases similar to rankdeficient.

test/runtests.jl Outdated Show resolved Hide resolved
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems almost ready! Just a few stylistic comments.

src/glmfit.jl Outdated Show resolved Hide resolved
src/glmfit.jl Outdated Show resolved Hide resolved
src/linpred.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
src/linpred.jl Outdated Show resolved Hide resolved
mousum-github and others added 10 commits October 20, 2022 14:41
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
src/glmfit.jl Outdated Show resolved Hide resolved
Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl>
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants