Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse root VJP for Lasso penalty #274

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

tristandeleu
Copy link

@tristandeleu tristandeleu commented Jul 14, 2022

We have been working with @QB3 on a rework of #17, adding support functions to enable implicit differentiation with sparsity-inducing penalties. The main difference is that we are masking the linear system to solve in root_vjp based on the support of the solution, instead of restricting it to its support, in order to be jit-compatible. For now, this functionality has only been added to ProximalGradient, but we can add it to other solvers as well if we agree on the API.

Specifying the support explicitly allows us to ensure that the Jacobian will be non-zero only for coordinates in the support of the solution. In the case of Lasso, this also allows us to use CG to solve the linear system, instead of Normal-CG, since we ensure that the matrix to invert is symmetric. We have written a benchmark to showcase the advantages of masking the linear system to the support only in Lasso:

Number of samples: 100
Number of features: 1000
Size of the support of the solution: 5
Numerical Jacobian: [-0.68067934 -0.97320587 -0.87455234 -0.65993433 -1.13817755]
Jacobian w/o support, CG (15.620 sec.): [ 2.5179840e+09 -1.3495498e+09  1.2744818e+07  1.1016900e+10
 -1.7797000e+09] (size of the support: 5)
Jacobian w/o support, normal CG (2.249 sec.): [-0.68141943 -0.973256   -0.8743981  -0.6596938  -1.1382269 ] (size of the support: 1000)
Jacobian w/ restricted support (1.473 sec.): [-0.6814198  -0.973256   -0.8743989  -0.65969384 -1.1382272 ] (size of the support: 5)
Jacobian w/ masked support (1.503 sec.): [-0.6814196  -0.97325593 -0.87439895 -0.6596938  -1.1382273 ] (size of the support: 5)

(more details about the PR to be added)

@mblondel
Copy link
Collaborator

Thanks @tristandeleu and @QB3 for this work, this is very interesting! If I understand your main point correctly, masking is almost as fast as restricting but is jit friendly and allows to use CG in the case of the lasso?

I would be curious to know the benchmark results on GPU.

By the way, I see the PR is currently marked as draft. Let me know when you would like me to review :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants