Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-15010: [R] Create a function registry for our NSE funcs #11904

Closed
wants to merge 49 commits into from

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Dec 8, 2021

This is PR implementing that admittedly does a few things. I'd be happy to split up with a suggestion on the desired steps if that makes it easier to review. In fact, I'm totally game to use reviews on this PR to collect ideas about the changes that collectively we agree on and just implement that subset in a new PR.

This PR:

  • Defines register_translation() and register_translation_agg() instead of direct assignment into nse_funcs and agg_funcs. This enables attaching a package name when the function is being registered (e.g., register_translation("stringr::str_dup", function(x, ...) ...)) and makes it possible in the future to allow other packages to define translations and/or for us to change how we evaluate translated expressions without changing many function assignments.
  • Moves the registration of translations to package load time rather than package build time. This enables splitting up translations into multiple files, adds the usual CMD check that normal functions undergo (e.g., for use of missing variables), and opens up the possibility of defining different translations for different versions of packages (or omitting translations if a package isn't installed).
  • Splits up the definition of translations into dplyr-funcs-type.R, dplyr-funcs-conditional.R, dplyr-funcs-string.R, dplyr-funcs-datetime.R, and dplyr-funcs-math.R. This matches where the translations are tested (the test files were named test-dplyr-funcs-string.R, etc.). Some translations were moved to dplyr-summarise.R because the translations were being tested in test-dplyr-summarise.R. This makes it easier for parallel PRs defining translations and makes it easier for new contributors to figure out where tests should go.
  • Consolidates internal documentation on how to write translations into one .Rd file rather than scattered around as comments in dplyr-functions.R as they were before.
  • Removes direct references to nse_funcs and agg_funcs except where used to implement evaluation

This PR does not:

  • Change any test filenames or remove any tests.
  • Change how translations are stored in the package namespace
  • Change anything about how evaluation works

Reprex with the gist of how the registration works:

# remotes::install_github("apache/arrow/r#11904")
withr::with_namespace("arrow", {
  
  # translations get defined in function wrappers so that they can get called
  # at package load (so there's no need to consider collate order)
  register_file_translations <- function() {
    
    register_translation("some_func", function() {
      Expression$scalar(1L)
    })
    
  }
  
  # the .onLoad() hook for registering translations lives in dplyr-funcs.R
  register_all_translations <- function() {
    register_file_translations()
    # ...and other translation functions that might live in other files
  }
  
  # ...and gets called in .onLoad()
  register_all_translations()
  
  # if you need to call a translation, use call_translation() (to keep references
  # to nse_funcs and/or agg_funcs constrained to the eval implementation)
  call_translation("some_func")
  
  # ...the same machinery exists for agg_funcs
  register_translation_agg("some_agg_func", function() {
    Expression$scalar(2L)
  })
  
  call_translation_agg("some_agg_func")
})
#> Expression
#> 2

Created on 2021-12-08 by the reprex package (v2.0.1)

@github-actions
Copy link

github-actions bot commented Dec 8, 2021

@github-actions
Copy link

github-actions bot commented Dec 8, 2021

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@ianmcook
Copy link
Member

I think this is best saved for a follow-up, but when I read this comment from Jon...

I've reviewed a lot of code in other places and have seen a number of styles that include either always using ::, or using it liberally to avoid confusion/mistakes.

It got me thinking: wouldn't it be nice if users could call any registered function by its namespace-qualified name or by its unqualified name? For example a user could call paste() or base::paste() and both would work the same. Or they could call str_c() or stringr::str_c() and both would work the same.

To achieve that, maybe register_binding() could take the function name and the namespace name in separate arguments, and register each function in both the unqualified and qualified forms?

@paleolimbot
Copy link
Member Author

It got me thinking: wouldn't it be nice if users could call any registered function by its namespace-qualified name or by its unqualified name?

There's a JIRA on this one (ARROW-14575)! In the PR for that JIRA we'd add the package names to all our currently registered functions, implement a slightly different registry that keeps track of packages instead of discarding it, and implement a binding for :: that uses some registry of package functions (but the unqualified versions would still work).

I chose to keep the namespace with the function in the same string because that's what vctrs::s3_register() does (i.e., you call vctrs::s3_regsiter("pkg::fun", ...)). The namespace could certainly be separate (vctrs::s3_register() used to be implemented this way)...maybe that's a good implementation detail to hash out in the PR for ARROW-14575? I didn't add any of the package names to register_binding() calls internally because that seemed easier to review as a separate PR.

@ianmcook
Copy link
Member

It got me thinking: wouldn't it be nice if users could call any registered function by its namespace-qualified name or by its unqualified name?

There's a JIRA on this one (ARROW-14575)! In the PR for that JIRA we'd add the package names to all our currently registered functions, implement a slightly different registry that keeps track of packages instead of discarding it, and implement a binding for :: that uses some registry of package functions (but the unqualified versions would still work).

I chose to keep the namespace with the function in the same string because that's what vctrs::s3_register() does (i.e., you call vctrs::s3_regsiter("pkg::fun", ...)). The namespace could certainly be separate (vctrs::s3_register() used to be implemented this way)...maybe that's a good implementation detail to hash out in the PR for ARROW-14575? I didn't add any of the package names to register_binding() calls internally because that seemed easier to review as a separate PR.

All sounds good to me 👍

Looks like some of the CI jobs were failing because of 404 errors from RStudio Package Manager. I restarted them.

@ianmcook
Copy link
Member

@paleolimbot is there anything else you think needs to be done in this PR before it gets merged?

@paleolimbot the docs added in ARROW-13834 need to be updated to reflect the changes in this PR. Do you want to do that after this merges? If so can you please create a Jira for that? Thanks

@jonkeane @nealrichardson are you satisfied with the responses to your review comments above?

@paleolimbot
Copy link
Member Author

I think this PR is ready to be merged from my perpspective (pending any changes requested by Neal, Ian, or Jon). I created ARROW-15193 for doc updates...I'll do it after this is merged.

Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me, thanks for them (and thanks to Ian and Neal for the comments).

The suggestions here are only possible whitespace cleanups. I'm happy to merge this, but will wait until early next week to make sure people have a chance to come back and object before I do.

r/R/dplyr-funcs-conditional.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-math.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-type.R Show resolved Hide resolved
r/R/dplyr-funcs-type.R Outdated Show resolved Hide resolved
r/R/dplyr-summarize.R Outdated Show resolved Hide resolved
paleolimbot and others added 6 commits December 30, 2021 17:39
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
options = list(field_names = names(args))
)
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}

This is extraneous, no? And it looks like the lines above this are also indented?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, my bad, this actually needed register_bindings_type_cast <- function() { above it

@paleolimbot paleolimbot force-pushed the r-register-translation branch from 29fe0d6 to 54286eb Compare January 4, 2022 14:05
@jonkeane jonkeane closed this in 314d8bf Jan 4, 2022
thisisnic pushed a commit that referenced this pull request Jan 5, 2022
Now that #11904 is merged (ARROW-15010), we have slightly different syntax for defining bindings between compute C++ and R functions. @thisisnic wrote some excellent documentation for creating bindings which needs a few tweaks to reflect how this is now done!

Closes #12075 from paleolimbot/r-binding-docs

Authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
jonkeane pushed a commit that referenced this pull request Jan 7, 2022
This PR is to fix a valgrind failure that started occurring after #11904 (ARROW-15010) was merged.

Closes #12090 from paleolimbot/r-tests-re

Authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
@paleolimbot paleolimbot deleted the r-register-translation branch January 20, 2022 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants