Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-14575 [R] Allow functions with {{pkg::}} prefixes - second approach #13513

Conversation

dragosmg
Copy link
Contributor

@dragosmg dragosmg commented Jul 5, 2022

Same goal as #13160, namely to allow the use of namespacing with bindings:

library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)

test_df <- tibble(
  date = as.Date(c("2022-03-22", "2021-07-30", NA))
)

test_df %>%
  mutate(ddate = lubridate::as_datetime(date)) %>%
  collect()
#> # A tibble: 3 × 2
#>   date       ddate              
#>   <date>     <dttm>             
#> 1 2022-03-22 2022-03-22 00:00:00
#> 2 2021-07-30 2021-07-30 00:00:00
#> 3 NA         NA

test_df %>%
  arrow_table() %>% 
  mutate(ddate = lubridate::as_datetime(date)) %>%
  collect()
#> # A tibble: 3 × 2
#>   date       ddate              
#>   <date>     <dttm>             
#> 1 2022-03-22 2022-03-22 00:00:00
#> 2 2021-07-30 2021-07-30 00:00:00
#> 3 NA         NA

Created on 2022-05-14 by the reprex package (v2.0.1)

The approach (option 2):

  • each binding is registered only once, as pkg::fun() and we change the way we look up a binding

Steps:

  • add functionality to allow binding registration with the pkg::fun() name;
    • register_binding() registers a single, prefixed copy of fun, pkg::fun.
    • Add a binding for the :: operator, which helps with retrieving bindings from the function registry.
    • Add generic unit tests for the pkg::fun functionality.
    • Throw a classed error ("arrow-binding-error") if fun is not found and look it up again, this time as ::fun. All of:
      • mutate()
      • filter(), and
      • summarise() should able to handle this new error class
  • register nse_funcs requiring indirect mapping
    • register each binding with the corresponding pkg:: prefix.
    • add / update unit tests for the nse_funcs bindings to include at least one pkg::fun() call for each binding
  • register nse_funcs requiring direct mapping (unary and binary bindings)
    • register unary bindings
    • register binary bindings
    • add / update unit tests for the nse_funcs bindings to include at least one pkg::fun() call for each binding
  • document changes in the Writing bindings documentation
    • going forward we should be using pkg::fun when defining a binding.

Bindings that will not be registered with a pkg:: prefix:

  • type casting, such as cast() or dictionary_encode()
  • operators (e.g. "!", "==", "!=", ">", ">=", "<", "<=", "&", etc.)
  • aggregating functions(e.g. sum, any, all, mean, sd, var, etc)
    • something fails when extracting all function calls in an expression with all_funs(). For example, in dplyr::n(), only :: is identified as a function call by all_funs() and {arrow}'s is_function().

@github-actions
Copy link

github-actions bot commented Jul 5, 2022

@dragosmg dragosmg closed this Jul 18, 2022
@dragosmg dragosmg deleted the allow_binding_registration_with_pkg_prefix branch July 18, 2022 08:23
@dragosmg dragosmg restored the allow_binding_registration_with_pkg_prefix branch July 18, 2022 13:10
@dragosmg dragosmg reopened this Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant