Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict() on a mlp with nnet double names the output with .pred_ #174

Closed
mouli3c3 opened this issue May 1, 2019 · 4 comments · Fixed by #225
Closed

predict() on a mlp with nnet double names the output with .pred_ #174

mouli3c3 opened this issue May 1, 2019 · 4 comments · Fixed by #225
Labels
bug an unexpected problem or unintended behavior

Comments

@mouli3c3
Copy link

mouli3c3 commented May 1, 2019

This problem is similar to an already closed issue(#107) but with mlp using nnet.


library(tidymodels)
#> -- Attaching packages ------------------------------------------------- tidymodels 0.0.2 --
#> v broom     0.5.1       v purrr     0.3.2  
#> v dials     0.0.2       v recipes   0.1.5  
#> v dplyr     0.8.0.1     v rsample   0.0.4  
#> v ggplot2   3.1.0       v tibble    2.1.1  
#> v infer     0.4.0       v yardstick 0.0.3  
#> v parsnip   0.0.2
#> -- Conflicts ---------------------------------------------------- tidymodels_conflicts() --
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()
data(credit_data)

set.seed(7075)
data_split <- initial_split(credit_data, strata = "Status", p = 0.75)

credit_train <- training(data_split)
credit_test  <- testing(data_split)
credit_rec <- 
  recipe(Status ~ ., data = credit_train) %>%
  step_knnimpute(Home, Job, Marital, Income, Assets, Debt) %>%
  step_dummy(all_nominal(), -Status) %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors()) %>%
  prep(training = credit_train, retain = TRUE)

test_normalized <- bake(credit_rec, new_data = credit_test, all_predictors())

set.seed(57974)
nnet_fit <-set_engine(mlp("classification",hidden_units =10),"nnet") %>%
  fit(Status ~ ., data = juice(credit_rec))

glm_fit <- set_engine(logistic_reg(),"glm") %>% 
  fit(Status ~ ., data = juice(credit_rec))

#Issue with predict on nnet
glimpse(predict(nnet_fit, new_data = test_normalized, type = "prob"))
#> Observations: 1,113
#> Variables: 2
#> $ .pred_.pred_bad  <dbl> 0.5608545, 0.7023505, 0.3303682, 0.4221877, 0...
#> $ .pred_.pred_good <dbl> 0.4391455, 0.2976495, 0.6696318, 0.5778123, 0...

#Normal with predict on glm (No issue)
glimpse(predict(glm_fit, new_data = test_normalized, type = "prob"))
#> Observations: 1,113
#> Variables: 2
#> $ .pred_bad  <dbl> 0.04675355, 0.94317298, 0.24316454, 0.06970005, 0.0...
#> $ .pred_good <dbl> 0.95324645, 0.05682702, 0.75683546, 0.93029995, 0.9...
@topepo topepo added the bug an unexpected problem or unintended behavior label May 1, 2019
@topepo
Copy link
Member

topepo commented May 1, 2019

SIDM (same issue, different model)

@patr1ckm
Copy link
Contributor

patr1ckm commented Oct 29, 2019

I can confirm that this can be closed. Running

data(credit_data)
nnet_fit <-set_engine(mlp("classification",hidden_units =10),"nnet") %>%
  fit(Status ~ ., data = credit_data)

glm_fit <- set_engine(logistic_reg(),"glm") %>% 
  fit(Status ~ ., data = credit_data)

Produces:

> glimpse(predict(nnet_fit, new_data = credit_data, type = "prob"))
Observations: 4,454
Variables: 2
$ .pred_V1 <dbl> 0.3419620, 0.3419620, 0.3392285, 0.3387520, 0.4335137, 0.2995662, 0.2995662, 0.3010878, 0.4102205, 0.5224852, 0.33…
$ .pred_V2 <dbl> 0.6580380, 0.6580380, 0.6607715, 0.6612480, 0.5664863, 0.7004338, 0.7004338, 0.6989122, 0.5897795, 0.4775148, 0.66…

> glimpse(predict(glm_fit, new_data = credit_data, type = "prob"))
Observations: 4,454
Variables: 2
$ .pred_bad  <dbl> 0.24860098, 0.11323173, 0.56131606, 0.21922027, 0.14454134, 0.03888827, 0.04857814, 0.03515797, 0.23389520, 0.80…
$ .pred_good <dbl> 0.75139902, 0.88676827, 0.43868394, 0.78077973, 0.85545866, 0.96111173, 0.95142186, 0.96484203, 0.76610480, 0.19…

However, note the columns are named differently depending on the model type in this case.

@mouli3c3
Copy link
Author

I see that nnet_fit$lvl and glm_fit$lvl both indicate target levels as "bad" "good". I'm not sure if it is intended behavior to see predict producing different column names for different models!!

@github-actions
Copy link

github-actions bot commented Mar 8, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
3 participants