-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update .proto files to the same version that Arrow is using #181
Conversation
It looks like it's on its way, but we need apache/arrow#13914 to merge before we can use Arrow's substrait compiler to pass any DuckDB's substrait compiler now requires strong typing (previously it was perfectly happy to accept |
…ow for now because of the Emit clause thing
Ok, the CMD check is clean! There are a lot of files in the diff because I updated the internal copy of substrait, but the main behaviour change is # needs install.packages("duckdb") to get the latest version (0.5.0)
library(substrait, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
ggplot2::mpg |>
duckdb_substrait_compiler() |>
filter(hwy > 30) |>
arrange(cty) |>
collect()
#> # A tibble: 22 × 11
#> manufacturer model displ year cyl trans drv cty hwy fl class
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
#> 1 audi a4 2 2008 4 manu… f 20 31 p comp…
#> 2 hyundai sonata 2.4 2008 4 manu… f 21 31 r mids…
#> 3 toyota camry 2.4 2008 4 manu… f 21 31 r mids…
#> 4 toyota camry 2.4 2008 4 auto… f 21 31 r mids…
#> 5 toyota camry sol… 2.4 2008 4 manu… f 21 31 r comp…
#> 6 toyota camry sol… 2.4 2008 4 auto… f 22 31 r comp…
#> 7 nissan altima 2.5 2008 4 auto… f 23 31 r mids…
#> 8 nissan altima 2.5 2008 4 manu… f 23 32 r mids…
#> 9 honda civic 1.6 1999 4 auto… f 24 32 r subc…
#> 10 honda civic 1.6 1999 4 auto… f 24 32 r subc…
#> # … with 12 more rows Created on 2022-09-08 by the reprex package (v2.0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your approach here of keeping both substrait_select()
and substrait_project()
. Just one small comment about variable naming.
R/project-rel.R
Outdated
# fields created by earlier arguments are accessible from later arguments | ||
quos <- rlang::enquos(..., .named = TRUE) | ||
expressions <- list() | ||
types <- list() | ||
|
||
# Keep a list of columns that we need to drop as part of this relation | ||
drop_columns <- character() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having an argument named .drop_columns
and then this variable names drop_columns
is a little confusing - mind renaming this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an excellent point! I updated the name and the comments to reflect why they're different variables in the way that I wrote it.
Currently fails spectacularly because:
DUCKDB_R_EXTENSIONS=substrait
fails withModuleNotFoundError: No module named 'substrait_config'
; theINSTALL 'substrait'; LOAD 'substrait';
SQL command also fails (IO Error: Failed to download extension "substrait" at URL "http://extensions.duckdb.org/deadbeeff/osx_arm64/substrait.duckdb_extension.gz"
)ProjectRel
to mean "add these columns" (i.e.,mutate()
) rather than "clear the column list and create new columns" (i.e.,transmute()
). I haven't looked into how to drop columns (something involvingsubstrait::substrait$RelCommon$Emit$create()
)