You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Generating temporary column (using window function) for filtering causes error in dbplyr version 2.3.1.
Dbplyr version 2.3.0 does not have this issue.
Dbplyr version 2.3.1 changelog mentions a breaking change which does not seem applicable to this, given its description, but this could be a side-effect of fix for issue #1103.
Below reprex uses dbplyr version 2.3.0 where query is constructed without issues.
library(ggplot2)
library(dplyr)
#> #> Attaching package: 'dplyr'#> The following objects are masked from 'package:stats':#> #> filter, lag#> The following objects are masked from 'package:base':#> #> intersect, setdiff, setequal, union
library(dbplyr)
#> #> Attaching package: 'dbplyr'#> The following objects are masked from 'package:dplyr':#> #> ident, sql
library(sparklyr)
#> #> Attaching package: 'sparklyr'#> The following object is masked from 'package:stats':#> #> filter
spark_disconnect_all()
#> [1] 0sc<- spark_connect("local")
# Copy mpg dataset to sparkmpgs<-mpg %>%
copy_to(sc, ., "mpg", TRUE)
# Generating temporary column for filtertingmpgs %>%
group_by(manufacturer, model) %>%
arrange(desc(drv)) %>%
mutate(is1999= first(year) ==1999) %>%
filter(is1999) %>%
select(-is1999) %>%
show_query#> Warning: ORDER BY is ignored in subqueries without LIMIT#> ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?#> <SQL>#> SELECT#> `manufacturer`,#> `model`,#> `displ`,#> `year`,#> `cyl`,#> `trans`,#> `drv`,#> `cty`,#> `hwy`,#> `fl`,#> `class`#> FROM (#> SELECT#> *,#> FIRST_VALUE(`year`) OVER (PARTITION BY `manufacturer`, `model` ORDER BY `drv` DESC) = 1999.0 AS `is1999`#> FROM `mpg`#> ) `q01`#> WHERE (`is1999`)# Not generating temporary column for filteringmpgs %>%
group_by(manufacturer, model) %>%
arrange(desc(drv)) %>%
filter(first(year) ==1999) %>%
show_query#> Warning: ORDER BY is ignored in subqueries without LIMIT#> ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?#> <SQL>#> SELECT#> `manufacturer`,#> `model`,#> `displ`,#> `year`,#> `cyl`,#> `trans`,#> `drv`,#> `cty`,#> `hwy`,#> `fl`,#> `class`#> FROM (#> SELECT#> *,#> FIRST_VALUE(`year`) OVER (PARTITION BY `manufacturer`, `model` ORDER BY `drv` DESC) AS `q02`#> FROM `mpg`#> ) `q01`#> WHERE (`q02` = 1999.0)
Below reprex uses dbplyr version 2.3.1 where query construction fails for the example with temporary column in it.
Below reprex also consists of an example with successful query construction which does not use the temporary column in it. For this case, the table alias and column alias are both q01. Should that be ok / is it an accepted practice? Wouldn't it interfere with understandability for someone reading the generated SQL? dbplyr version 2.3.0 uses q01 to denote table alias and q02 to denote column alias for the same example.
library(ggplot2)
library(dplyr)
#> #> Attaching package: 'dplyr'#> The following objects are masked from 'package:stats':#> #> filter, lag#> The following objects are masked from 'package:base':#> #> intersect, setdiff, setequal, union
library(dbplyr)
#> #> Attaching package: 'dbplyr'#> The following objects are masked from 'package:dplyr':#> #> ident, sql
library(sparklyr)
#> #> Attaching package: 'sparklyr'#> The following object is masked from 'package:stats':#> #> filter
spark_disconnect_all()
#> [1] 0sc<- spark_connect("local")
# Copy mpg dataset to sparkmpgs<-mpg %>%
copy_to(sc, ., "mpg", TRUE)
# Generating temporary column for filtertingmpgs %>%
group_by(manufacturer, model) %>%
arrange(desc(drv)) %>%
mutate(is1999= first(year) ==1999) %>%
filter(is1999) %>%
select(-is1999) %>%
show_query#> Error in `purrr::map_chr()`:#> ℹ In index: 1.#> Caused by error in `as_string()`:#> ! Can't convert a call to a string.#> Backtrace:#> ▆#> 1. ├─... %>% show_query#> 2. ├─dplyr::show_query(.)#> 3. ├─dplyr::select(., -is1999)#> 4. ├─sparklyr:::select.tbl_spark(., -is1999)#> 5. ├─base::NextMethod()#> 6. └─dbplyr:::select.tbl_lazy(., -is1999)#> 7. └─dbplyr:::add_select(.data, new_vars)#> 8. └─dbplyr:::rename_order(lazy_query, vars)#> 9. └─purrr::map_chr(order, as_name)#> 10. └─purrr:::map_("character", .x, .f, ..., .progress = .progress)#> 11. ├─purrr:::with_indexed_errors(...)#> 12. │ └─base::withCallingHandlers(...)#> 13. ├─purrr:::call_with_cleanup(...)#> 14. └─rlang (local) .f(.x[[i]], ...)#> 15. └─rlang::as_string(x)#> 16. └─rlang:::abort_coercion(x, "a string")#> 17. └─rlang::abort(msg, call = call)# Not generating temporary column for filteringmpgs %>%
group_by(manufacturer, model) %>%
arrange(desc(drv)) %>%
filter(first(year) ==1999) %>%
show_query#> Warning: ORDER BY is ignored in subqueries without LIMIT#> ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?#> <SQL>#> SELECT#> `manufacturer`,#> `model`,#> `displ`,#> `year`,#> `cyl`,#> `trans`,#> `drv`,#> `cty`,#> `hwy`,#> `fl`,#> `class`#> FROM (#> SELECT#> *,#> FIRST_VALUE(`year`) OVER (PARTITION BY `manufacturer`, `model` ORDER BY `drv` DESC) AS `q01`#> FROM `mpg`#> ) `q01`#> WHERE (`q01` = 1999.0)
Generating temporary column (using window function) for filtering causes error in dbplyr version 2.3.1.
Dbplyr version 2.3.0 does not have this issue.
Dbplyr version 2.3.1 changelog mentions a breaking change which does not seem applicable to this, given its description, but this could be a side-effect of fix for issue #1103.
Below reprex uses dbplyr version 2.3.0 where query is constructed without issues.
Created on 2023-04-12 with reprex v2.0.2
Session info
Below reprex uses dbplyr version 2.3.1 where query construction fails for the example with temporary column in it.
Below reprex also consists of an example with successful query construction which does not use the temporary column in it. For this case, the table alias and column alias are both
q01
. Should that be ok / is it an accepted practice? Wouldn't it interfere with understandability for someone reading the generated SQL? dbplyr version 2.3.0 usesq01
to denote table alias andq02
to denote column alias for the same example.Created on 2023-04-12 with reprex v2.0.2
Session info
The text was updated successfully, but these errors were encountered: