Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some (?) interactions between select and window functions result in error (dbplyr 2.3.0) #1104

Closed
lschneiderbauer opened this issue Jan 19, 2023 · 4 comments · Fixed by #1105

Comments

@lschneiderbauer
Copy link

Hi,
I think I found another regression in dbplyr 2.3.0.

It is quite an artifical example, but nevertheless. An explicit selection after a summarizing operation seems to somehow result in problems when using window functions afterwards. (In this example we see row_number(), but I also have the problem with slice_*()).

library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)

table <- tbl_lazy(tibble(key = 1), dbplyr::simulate_hana())

table %>%
  group_by(key) %>%
  summarize(
    test = "1"
  ) %>%
  select(key) %>%
  group_by(key) %>%
  filter(row_number() == 1) %>%
  ungroup()
#> Error in `row_number()`:
#> ! `row_number()`` is only available in a windowed (`mutate()`) context

#> Backtrace:
#>      ▆
#>   1. ├─base::tryCatch(...)
#>   2. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>   3. │   ├─base (local) tryCatchOne(...)
#>   4. │   │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   5. │   └─base (local) tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
#>   6. │     └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>   7. │       └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   8. ├─base::withCallingHandlers(...)
#>   9. ├─base::saveRDS(...)
#>  10. ├─base::do.call(...)
#>  11. ├─base (local) `<fn>`(...)
#>  12. ├─global `<fn>`(input = base::quote("plump-imago_reprex.R"))
#>  13. │ └─rmarkdown::render(input, quiet = TRUE, envir = globalenv(), encoding = "UTF-8")
#>  14. │   └─knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
#>  15. │     └─knitr:::process_file(text, output)
#>  16. │       ├─base::withCallingHandlers(...)
#>  17. │       ├─knitr:::process_group(group)
#>  18. │       └─knitr:::process_group.block(group)
#>  19. │         └─knitr:::call_block(x)
#>  20. │           └─knitr:::block_exec(params)
#>  21. │             └─knitr:::eng_r(options)
#>  22. │               ├─knitr:::in_input_dir(...)
#>  23. │               │ └─knitr:::in_dir(input_dir(), expr)
#>  24. │               └─knitr (local) evaluate(...)
#>  25. │                 └─evaluate::evaluate(...)
#>  26. │                   └─evaluate:::evaluate_call(...)
#>  27. │                     ├─evaluate (local) handle(...)
#>  28. │                     │ └─base::try(f, silent = TRUE)
#>  29. │                     │   └─base::tryCatch(...)
#>  30. │                     │     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  31. │                     │       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  32. │                     │         └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>  33. │                     ├─base::withCallingHandlers(...)
#>  34. │                     ├─base::withVisible(value_fun(ev$value, ev$visible))
#>  35. │                     └─knitr (local) value_fun(ev$value, ev$visible)
#>  36. │                       └─knitr (local) fun(x, options = options)
#>  37. │                         ├─base::withVisible(knit_print(x, ...))
#>  38. │                         ├─knitr::knit_print(x, ...)
#>  39. │                         └─knitr:::knit_print.default(x, ...)
#>  40. │                           └─evaluate (local) normal_print(x)
#>  41. │                             ├─base::print(x)
#>  42. │                             └─dbplyr:::print.tbl_lazy(x)
#>  43. │                               ├─dplyr::show_query(x)
#>  44. │                               └─dbplyr:::show_query.tbl_lazy(x)
#>  45. │                                 └─dbplyr::remote_query(x, cte = cte)
#>  46. │                                   ├─dbplyr::db_sql_render(remote_con(x), x, cte = cte)
#>  47. │                                   └─dbplyr:::db_sql_render.DBIConnection(remote_con(x), x, cte = cte)
#>  48. │                                     ├─dbplyr::sql_render(sql, con = con, ..., cte = cte)
#>  49. │                                     └─dbplyr:::sql_render.tbl_lazy(sql, con = con, ..., cte = cte)
#>  50. │                                       ├─dbplyr::sql_render(...)
#>  51. │                                       └─dbplyr:::sql_render.lazy_query(...)
#>  52. │                                         ├─dbplyr::sql_build(query, con = con, ...)
#>  53. │                                         └─dbplyr:::sql_build.lazy_select_query(query, con = con, ...)
#>  54. │                                           ├─dbplyr::select_query(...)
#>  55. │                                           │ └─base::structure(...)
#>  56. │                                           ├─dbplyr::sql_build(op$x, con)
#>  57. │                                           └─dbplyr:::sql_build.lazy_select_query(op$x, con)
#>  58. │                                             └─dbplyr:::get_select_sql(...)
#>  59. │                                               └─dbplyr::translate_sql_(select_expr, con, window = FALSE, context = list(clause = "SELECT"))
#>  60. │                                                 └─base::lapply(...)
#>  61. │                                                   └─dbplyr (local) FUN(X[[i]], ...)
#>  62. │                                                     ├─dbplyr::escape(eval_tidy(x, mask), con = con)
#>  63. │                                                     └─rlang::eval_tidy(x, mask)
#>  64. └─dbplyr (local) row_number()
#>  65.   └─cli::cli_abort("`{f}()`` is only available in a windowed ({.fun mutate}) context")
#>  66.     └─rlang::abort(...)

Created on 2023-01-19 with reprex v2.0.2

Removing the intermediate select statement makes it work correctly. (Of course in this example the select is superfluous anyways, but in my actual query it is not.)

library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)

table <- tbl_lazy(tibble(key = 1), dbplyr::simulate_hana())

table %>%
  group_by(key) %>%
  summarize(
    test = "1"
  ) %>%
#  select(key) %>%
  group_by(key) %>%
  filter(row_number() == 1) %>%
  ungroup()
#> <SQL>
#> SELECT `key`, `test`
#> FROM (
#>   SELECT *, ROW_NUMBER() OVER (PARTITION BY `key`) AS `q01`
#>   FROM (
#>     SELECT `key`, '1' AS `test`
#>     FROM `df`
#>     GROUP BY `key`
#>   ) `q01`
#> ) `q02`
#> WHERE (`q01` = 1.0)

Created on 2023-01-19 with reprex v2.0.2

@mgirlich
Copy link
Collaborator

Thanks for the reprex!
You do seem to have some complex queries in your workflow! 😄
Can you confirm the PR fixes your issues (devtools::install_github("tidyverse/dbplyr#1105"))?

By the way you should use window_order() before using a window function like row_number(). As this only works in some databases and doesn't have well defined behaviour we'll probably turn this into an error soon.

@lschneiderbauer
Copy link
Author

Yeah, by now we maintain our own R package, which dynamically creates dplyr queries, and in the end using your cool dbplyr package to generate and send off a final SQL to gather data for some reporting purposes. So the queries can become quite massive and convoluted at times.

Unfortunately I will not be able to test this until next Monday, but then I will provide feedback.
Will the devtools::install_github("tidyverse/dbplyr#1105") already include the "4 table join bug" fix?

Thanks for the hint! I was actually using window_order() but I removed it for that reprex.

@mgirlich
Copy link
Collaborator

Ah, that sounds interesting. So you might benefit a lot from the 2.3 version as the reduced number of subqueries for joins can lead to big performance gains (at least I had a couple of cases already).

Yes, the PR includes the fix for the 4 table join bug.

@lschneiderbauer
Copy link
Author

That why I was eager to test the new version. :)

I tested the PR and I can confirm that it fixes the issue, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants