Inconsistency with the glmnet model ?

I was trying to use glmnet for  feature selection.

I used the code below inspired from Julia Silge's blog post (many thanks to her) https://juliasilge.com/blog/lasso-the-office/ 
So I was selecting variables with 2 different lambda.
If I am not mistaken, selected variable with the higher penalty should be included among the variables selected with the lowest penalty. which isn't the case in my data and reproduced in the toy data below.

The final model with finalize_workflow() should no longer variate ?

``` r
library(dplyr)
library(tidymodels)
library(vip)
library(glmnet) # for the toy dataset
data(QuickStartExample) # glmnet toy data
set.seed(1234)
# we add some vars and noise to simulate data with p>>>n
x <- replicate(8, {
  QuickStartExample$x+rnorm(2000)
})
QuickStartExample$x = cbind(QuickStartExample$x,
                            x[,,1],x[,,2],x[,,3],x[,,4],
                            x[,,5],x[,,6],x[,,7],x[,,8]
                            )
data=data.frame(y=QuickStartExample$y,QuickStartExample$x)

rm(QuickStartExample)
set.seed(1234)
split_init <- initial_split(data, strata = y)
train_d <- training(split_init)
test_d <- testing(split_init)

cv=vfold_cv(train_d,v=10,repeats = 3)
rec <- recipe(y ~ ., data = train_d) 

lasso_spec <- linear_reg(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet")

wf <- workflow() %>%
  add_recipe(rec)%>%
  add_model(lasso_spec) 
res <- tune_grid(
  wf,
  resamples = cv,
  grid = 100
)
autoplot(res)
```

![](https://i.imgur.com/qdvzrNq.png)

``` r
L1=res %>% 
  select_by_pct_loss(metric = "rmse",limit = 10, desc(penalty))
L2=res %>% 
  select_by_one_std_err(metric = "rmse",desc(penalty)) 
L1$penalty
#> [1] 0.2018086
L2$penalty
#> [1] 0.1004087
final_mod1 <- finalize_workflow(wf,L1)%>%
  fit(train_d)
final_mod2 <- finalize_workflow(wf,L2)%>%
  fit(train_d)
vars1=vip::vi(final_mod1%>%extract_fit_parsnip(),lambda =L1$penalty)
vars2=vip::vi(final_mod2%>%extract_fit_parsnip(),lambda =L2$penalty)
L1$penalty > L2$penalty  ## lambda L2 is less stringent so we expect to see selected vars by L1 included in the selected ones by L1
#> [1] TRUE
v1=vars1%>%filter(Importance!=0)%>%pull(Variable)
v2=vars2%>%filter(Importance!=0)%>%pull(Variable)

table(v1%in%v2)
#> 
#> FALSE  TRUE 
#>     2    25

v1
#>  [1] "X1"   "X14"  "X5"   "X6"   "X3"   "X40"  "X20"  "X143" "X160" "X140"
#> [11] "X111" "X26"  "X8"   "X75"  "X76"  "X163" "X120" "X65"  "X72"  "X146"
#> [21] "X171" "X29"  "X60"  "X148" "X74"  "X165" "X35"
v2
#>  [1] "X1"   "X14"  "X5"   "X40"  "X6"   "X3"   "X143" "X140" "X26"  "X111"
#> [11] "X8"   "X160" "X75"  "X163" "X65"  "X72"  "X120" "X76"  "X134" "X66" 
#> [21] "X35"  "X165" "X60"  "X177" "X56"  "X73"  "X59"  "X29"  "X146" "X24" 
#> [31] "X170" "X64"  "X159" "X148" "X141" "X54"  "X74"  "X67"
``` 
<sup>Created on 2024-05-24 with [reprex v2.1.0](https://reprex.tidyverse.org)</sup>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistency with the glmnet model ? #1124

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistency with the glmnet model ? #1124

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions