Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in example? cbind() with predicted() #2

Open
strengejacke opened this issue May 31, 2023 · 12 comments
Open

Bug in example? cbind() with predicted() #2

strengejacke opened this issue May 31, 2023 · 12 comments

Comments

@strengejacke
Copy link

I tried to adopt an example, where predict() was used in combination with cbind() and the newdata-data frame. But, e.g. looking at rows 1, 7 and 13, you see that these refer to the same combination hincome = 20, children = absent and response = not.work, but all rows have different predicted probabilities.

That raises the question if I use as.data.frame(predict()), how do I find out which row corresponds to which combination of factor levels and response level?

library(nestedLogit)

data("Womenlf", package = "carData")
m <- nestedLogit(partic ~ hincome + children,
  logits(
    work = dichotomy("not.work", c("parttime", "fulltime")),
    full = dichotomy("parttime", "fulltime")
  ),
  data = Womenlf
)
new <- expand.grid(
  hincome = c(20, 30, 40),
  children = c("absent", "present")
)
cbind(new, predict(m, newdata = new))
#>    hincome children response           p        se.p       logit  se.logit
#> 1       20   absent not.work 0.379973389 0.061556512 -0.48966118 0.2612826
#> 2       30   absent parttime 0.129436733 0.047148453 -1.90594822 0.4184172
#> 3       40   absent fulltime 0.490589878 0.066559091 -0.03764493 0.2663307
#> 4       20  present not.work 0.483361927 0.095465380 -0.06657687 0.3822848
#> 5       30  present parttime 0.224958781 0.095638728 -1.23699902 0.5485371
#> 6       40  present fulltime 0.291679292 0.101605781 -0.88724209 0.4917936
#> 7       20   absent not.work 0.588194721 0.133170256  0.35650744 0.5497867
#> 8       30   absent parttime 0.285272275 0.128164048 -0.91845759 0.6285879
#> 9       40   absent fulltime 0.126533004 0.097928271 -1.93196717 0.8860491
#> 10      20  present not.work 0.747625342 0.037830221  1.08598725 0.2004976
#> 11      30  present parttime 0.199129118 0.034884708 -1.39174629 0.2187448
#> 12      40  present fulltime 0.053245540 0.019740560 -2.87812574 0.3915966
#> 13      20   absent not.work 0.818923571 0.052610678  1.50907156 0.3547877
#> 14      30   absent parttime 0.165901061 0.049355986 -1.61496044 0.3566754
#> 15      40   absent fulltime 0.015175368 0.011491854 -4.17278999 0.7689391
#> 16      20  present not.work 0.873487849 0.059489224  1.93215586 0.5383308
#> 17      30  present parttime 0.122673467 0.057838540 -1.96735317 0.5374096
#> 18      40  present fulltime 0.003838684 0.004592451 -5.55877971 1.2009709

Created on 2023-05-31 with reprex v2.0.2

@friendly
Copy link
Owner

Hi Daniel
Thank you for testing this for use in the ggeffects package.
It is surprising to me that cbind() works at all in this context, where predict() returns a list of 4 data frames.

Our as.data.frame.predictNestedLogit() function is a little quirky, because to get the factor levels you need to
also supply the newdata argument:

> pred <- predict(m, newdata = new)
> as.data.frame(pred, newdata=new)
   hincome children response           p        se.p       logit  se.logit
1       20   absent not.work 0.379973389 0.061556512 -0.48966118 0.2612826
2       20   absent parttime 0.129436733 0.047148453 -1.90594822 0.4184172
3       20   absent fulltime 0.490589878 0.066559091 -0.03764493 0.2663307
4       30   absent not.work 0.483361927 0.095465380 -0.06657687 0.3822848
5       30   absent parttime 0.224958781 0.095638728 -1.23699902 0.5485371
6       30   absent fulltime 0.291679292 0.101605781 -0.88724209 0.4917936
7       40   absent not.work 0.588194721 0.133170256  0.35650744 0.5497867
8       40   absent parttime 0.285272275 0.128164048 -0.91845759 0.6285879
9       40   absent fulltime 0.126533004 0.097928271 -1.93196717 0.8860491
10      20  present not.work 0.747625342 0.037830221  1.08598725 0.2004976
11      20  present parttime 0.199129118 0.034884708 -1.39174629 0.2187448
12      20  present fulltime 0.053245540 0.019740560 -2.87812574 0.3915966
13      30  present not.work 0.818923571 0.052610678  1.50907156 0.3547877
14      30  present parttime 0.165901061 0.049355986 -1.61496044 0.3566754
15      30  present fulltime 0.015175368 0.011491854 -4.17278999 0.7689391
16      40  present not.work 0.873487849 0.059489224  1.93215586 0.5383308
17      40  present parttime 0.122673467 0.057838540 -1.96735317 0.5374096
18      40  present fulltime 0.003838684 0.004592451 -5.55877971 1.2009709

Perhaps we should issue a warning or think of a better way to avoid this surprise; e.g., I suppose we could attach newdata as an attribute of the predict object.

@friendly
Copy link
Owner

So, would it make things easier if newdata were attached to the result of predict()?

@strengejacke
Copy link
Author

Ok, thanks! That seems to work!

See the discrepancies here:

library(nestedLogit)
data(Womenlf, package = "carData")
Womenlf$partic <- with(
  Womenlf,
  factor(partic, levels = c("not.work", "parttime", "fulltime"))
)

m <- nestedLogit::nestedLogit(partic ~ hincome + children,
  dichotomies = nestedLogit::logits(
    work = nestedLogit::dichotomy("not.work", working = c("parttime", "fulltime")),
    full = nestedLogit::dichotomy("parttime", "fulltime")
  ),
  data = Womenlf
)

new <- expand.grid(
  hincome = c(30, 40),
  children = c("absent", "present")
)

cbind(new, predict(m, newdata = new))
#>    hincome children response           p        se.p       logit  se.logit
#> 1       30   absent not.work 0.483361927 0.095465380 -0.06657687 0.3822848
#> 2       40   absent parttime 0.224958781 0.095638728 -1.23699902 0.5485371
#> 3       30  present fulltime 0.291679292 0.101605781 -0.88724209 0.4917936
#> 4       40  present not.work 0.588194721 0.133170256  0.35650744 0.5497867
#> 5       30   absent parttime 0.285272275 0.128164048 -0.91845759 0.6285879
#> 6       40   absent fulltime 0.126533004 0.097928271 -1.93196717 0.8860491
#> 7       30  present not.work 0.818923571 0.052610678  1.50907156 0.3547877
#> 8       40  present parttime 0.165901061 0.049355986 -1.61496044 0.3566754
#> 9       30   absent fulltime 0.015175368 0.011491854 -4.17278999 0.7689391
#> 10      40   absent not.work 0.873487849 0.059489224  1.93215586 0.5383308
#> 11      30  present parttime 0.122673467 0.057838540 -1.96735317 0.5374096
#> 12      40  present fulltime 0.003838684 0.004592451 -5.55877971 1.2009709

as.data.frame(predict(m, newdata = new), newdata = new)
#>    hincome children response           p        se.p       logit  se.logit
#> 1       30   absent not.work 0.483361927 0.095465380 -0.06657687 0.3822848
#> 2       30   absent parttime 0.224958781 0.095638728 -1.23699902 0.5485371
#> 3       30   absent fulltime 0.291679292 0.101605781 -0.88724209 0.4917936
#> 4       40   absent not.work 0.588194721 0.133170256  0.35650744 0.5497867
#> 5       40   absent parttime 0.285272275 0.128164048 -0.91845759 0.6285879
#> 6       40   absent fulltime 0.126533004 0.097928271 -1.93196717 0.8860491
#> 7       30  present not.work 0.818923571 0.052610678  1.50907156 0.3547877
#> 8       30  present parttime 0.165901061 0.049355986 -1.61496044 0.3566754
#> 9       30  present fulltime 0.015175368 0.011491854 -4.17278999 0.7689391
#> 10      40  present not.work 0.873487849 0.059489224  1.93215586 0.5383308
#> 11      40  present parttime 0.122673467 0.057838540 -1.96735317 0.5374096
#> 12      40  present fulltime 0.003838684 0.004592451 -5.55877971 1.2009709

Created on 2023-06-01 with reprex v2.0.2

I don't think newdata needs to be attached to the results of predict(), but you should maybe update the example from the help-page? That's where I had the cbind() approach from.

https://friendly.github.io/nestedLogit/reference/nestedMethods.html

@strengejacke
Copy link
Author

Now the plots make much more sense :-)

image

image

@friendly
Copy link
Owner

friendly commented Jun 1, 2023

IThanks; I'll fix the help page example.

@friendly friendly closed this as completed Jun 1, 2023
@friendly friendly reopened this Jun 1, 2023
@friendly
Copy link
Owner

friendly commented Jun 1, 2023

Can you please paste the code you used for your ggpredict figures?

@strengejacke
Copy link
Author

strengejacke commented Jun 1, 2023

sure (requires latest GitHub version from https://github.com/strengejacke/ggeffects)

library(ggeffects)
library(nestedLogit)

data("Womenlf", package = "carData")
m <- nestedLogit(partic ~ hincome + children,
  logits(
    work = dichotomy("not.work", c("parttime", "fulltime")),
    full = dichotomy("parttime", "fulltime")
  ),
  data = Womenlf
)

ggpredict(m, c("hincome", "children")) |> plot()
#> Data were 'prettified'. Consider using `terms="hincome [all]"` to get
#>   smooth plots.

Created on 2023-06-01 with reprex v2.0.2

@strengejacke
Copy link
Author

(r-universe binaries are available, you can now simply install ggeffects running this code: install.packages("ggeffects", repos = "https://strengejacke.r-universe.dev"))

@friendly
Copy link
Owner

friendly commented Jun 1, 2023

We made a change in our as.data.frame.predictNestedLogit, so the newdata is no longer required, as the predict() method includes the newdata in it's result as a .data component.

Consequently, ggpredict() now gives an error:

data("Womenlf", package = "carData")
m <- nestedLogit(partic ~ hincome + children,
                 logits(
                   work = dichotomy("not.work", c("parttime", "fulltime")),
                   full = dichotomy("parttime", "fulltime")
                 ),
                 data = Womenlf
)

ggpredict(m, c("hincome", "children")) |> plot()

# Data were 'prettified'. Consider using `terms="hincome [all]"` to get smooth plots.
# Error: At least one term specified in `terms` is no valid model term.

The fix for this is to remove the use of newdata in line 13 of get_predictions_nestedLogit.R, the last line shown below

predictions <- as.data.frame(stats::predict(
    model,
    newdata = data_grid,
    ...
  ), newdata = data_grid)

But, don't youj want to call nestedLogit::predict rather than stats::predict here?

@john-d-fox
Copy link
Collaborator

john-d-fox commented Jun 1, 2023 via email

@strengejacke
Copy link
Author

Consequently, ggpredict() now gives an error:

Do you have session conflicts? I just tested with the latest nestedLogit package version, and it still works fine. The newdata argument passed to as.data.frame() is silently ignored, which is good, because the code then works for both nestedLogit 0.3.0 and 0.3.1.

library(ggeffects)
library(nestedLogit)
data("Womenlf", package = "carData")
m <- nestedLogit(partic ~ hincome + children,
                 logits(
                   work = dichotomy("not.work", c("parttime", "fulltime")),
                   full = dichotomy("parttime", "fulltime")
                 ),
                 data = Womenlf
)

ggpredict(m, c("hincome", "children")) |> plot()
#> Data were 'prettified'. Consider using `terms="hincome [all]"` to get
#>   smooth plots.

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.0 (2023-04-21 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  German_Germany.utf8
#>  ctype    German_Germany.utf8
#>  tz       Europe/Berlin
#>  date     2023-06-02
#>  pandoc   3.1.1 @ C:/Users/mail/AppData/Local/Pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version  date (UTC) lib source
#>  abind         1.4-5    2016-07-21 [1] CRAN (R 4.3.0)
#>  backports     1.4.1    2021-12-13 [1] CRAN (R 4.3.0)
#>  broom         1.0.4    2023-03-11 [1] CRAN (R 4.3.0)
#>  car           3.1-2    2023-03-30 [1] CRAN (R 4.3.0)
#>  carData       3.0-5    2022-01-06 [1] CRAN (R 4.3.0)
#>  cli           3.6.1    2023-03-23 [1] CRAN (R 4.3.0)
#>  colorspace    2.1-0    2023-01-23 [1] CRAN (R 4.3.0)
#>  digest        0.6.31   2022-12-11 [1] CRAN (R 4.3.0)
#>  dplyr         1.1.2    2023-04-20 [1] CRAN (R 4.3.0)
#>  evaluate      0.21     2023-05-05 [1] CRAN (R 4.3.0)
#>  fansi         1.0.4    2023-01-22 [1] CRAN (R 4.3.0)
#>  farver        2.1.1    2022-07-06 [1] CRAN (R 4.3.0)
#>  fastmap       1.1.1    2023-02-24 [1] CRAN (R 4.3.0)
#>  forcats       1.0.0    2023-01-29 [1] CRAN (R 4.3.0)
#>  fs            1.6.2    2023-04-25 [1] CRAN (R 4.3.0)
#>  generics      0.1.3    2022-07-05 [1] CRAN (R 4.3.0)
#>  ggeffects   * 1.2.2.13 2023-06-02 [1] local
#>  ggplot2       3.4.2    2023-04-03 [1] CRAN (R 4.3.0)
#>  glue          1.6.2    2022-02-24 [1] CRAN (R 4.3.0)
#>  gtable        0.3.3    2023-03-21 [1] CRAN (R 4.3.0)
#>  haven         2.5.2    2023-02-28 [1] CRAN (R 4.3.0)
#>  hms           1.1.3    2023-03-21 [1] CRAN (R 4.3.0)
#>  htmltools     0.5.5    2023-03-23 [1] CRAN (R 4.3.0)
#>  insight       0.19.2   2023-05-23 [1] https://easystats.r-universe.dev (R 4.3.0)
#>  knitr         1.43     2023-05-25 [1] CRAN (R 4.3.0)
#>  labeling      0.4.2    2020-10-20 [1] CRAN (R 4.3.0)
#>  lifecycle     1.0.3    2022-10-07 [1] CRAN (R 4.3.0)
#>  magrittr      2.0.3    2022-03-30 [1] CRAN (R 4.3.0)
#>  munsell       0.5.0    2018-06-12 [1] CRAN (R 4.3.0)
#>  nestedLogit * 0.3.1    2023-06-02 [1] Github (friendly/nestedLogit@99c17d2)
#>  pillar        1.9.0    2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig     2.0.3    2019-09-22 [1] CRAN (R 4.3.0)
#>  purrr         1.0.1    2023-01-10 [1] CRAN (R 4.3.0)
#>  R.cache       0.16.0   2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3   1.8.2    2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo          1.25.0   2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils       2.12.2   2022-11-11 [1] CRAN (R 4.3.0)
#>  R6            2.5.1    2021-08-19 [1] CRAN (R 4.3.0)
#>  reprex        2.0.2    2022-08-17 [1] CRAN (R 4.3.0)
#>  rlang         1.1.1    2023-04-28 [1] CRAN (R 4.3.0)
#>  rmarkdown     2.21     2023-03-26 [1] CRAN (R 4.3.0)
#>  scales        1.2.1    2022-08-20 [1] CRAN (R 4.3.0)
#>  sessioninfo   1.2.2    2021-12-06 [1] CRAN (R 4.3.0)
#>  sjlabelled    1.2.0    2022-04-10 [1] CRAN (R 4.3.0)
#>  snakecase     0.11.0   2019-05-25 [1] CRAN (R 4.3.0)
#>  stringi       1.7.12   2023-01-11 [1] CRAN (R 4.3.0)
#>  stringr       1.5.0    2022-12-02 [1] CRAN (R 4.3.0)
#>  styler        1.10.0   2023-05-24 [1] CRAN (R 4.3.0)
#>  tibble        3.2.1    2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyr         1.3.0    2023-01-24 [1] CRAN (R 4.3.0)
#>  tidyselect    1.2.0    2022-10-10 [1] CRAN (R 4.3.0)
#>  utf8          1.2.3    2023-01-31 [1] CRAN (R 4.3.0)
#>  vctrs         0.6.2    2023-04-19 [1] CRAN (R 4.3.0)
#>  withr         2.5.0    2022-03-03 [1] CRAN (R 4.3.0)
#>  xfun          0.39     2023-04-20 [1] CRAN (R 4.3.0)
#>  yaml          2.3.7    2023-01-23 [1] CRAN (R 4.3.0)
#> 
#>  [1] C:/Users/mail/AppData/Local/R/win-library/4.3
#>  [2] C:/Program Files/R/R-4.3.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2023-06-02 with reprex v2.0.2

@john-d-fox
Copy link
Collaborator

john-d-fox commented Jun 2, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants