Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce r_wrap_or_clone() and r_wrap_or_copy() #1599

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Mar 10, 2023

These are aliases for R_shallow_duplicate_attr() and R_duplicate_attr() respectively

Added in 2018, with R 3.6.0, so we can use them without issues in all R versions we support except for 3.5, which we are dropping soon anyways
wch/r-source@bbcf686

I've chosen these function names because they either "wrap" the object into an ALTREP wrapper if possible (i.e. if the type is supported and it is "large enough"), or they call their respective r_clone/copy() equivalents.

There are 2 places I use these helpers, shown below. They are only useful if you are tweaking attributes and not the main object. Otherwise you should just r_clone().


We now use r_wrap_or_clone_shared() in the re-encoding helper. If we re-encode an attribute, but not the main object, then we can "wrap" the main object before re-assigning the attributes to it.

x <- 1:1e6 + 0L
attr(x, "foo") <- `Encoding<-`("fa\xE7ile", "latin1")

vctrs:::is_altrep(x)
#> [1] TRUE

bench::mark(encode = .Call(rlang:::ffi_test_obj_encode_utf8, x))

# Main
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 encode        356µs    606µs     1186.    3.81MB     149.

# This PR
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 encode        1.1µs   1.53µs   499840.        0B        0


out <- .Call(rlang:::ffi_test_obj_encode_utf8, x)
vctrs:::is_altrep(out)

# Main
#> [1] FALSE

# This PR - now an altrep wrapper
#> [1] TRUE

We now also use r_wrap_or_clone() in set_names() to avoid having to call names<-, which is what we were doing before when I thought that was the only way to get access to the ALTREP wrappers. We now only call names<- if x is an "object" that might have an S3 method.

This gives us a little performance boost, but names<- was already very fast to call (typically in the 350ns range) so it doesn't look huge. It is relatively large percentage wise though, I guess.

I don't show any cases where x is an S3 object, that is typically in the 3.5us range because we have to do a length() dispatch call and a names<- dispatch call in case there is an S3 method registered, and those cases are completely separate from this PR. We could probably improve on this a little by actually checking to see if there is a length() or names<- method registered in the base/global env and only dispatching if we see one, but I didn't do that here.

library(rlang)

x <- 1:1e6 + 0L
names <- as.character(x)

# no names before, no names after
bench::mark(set_names(x, NULL), iterations = 1000000)
# before
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, NULL)   1.12µs   1.38µs   695965.    3.26KB     16.7

# after
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, NULL)    868ns   1.11µs   856415.     7.9KB     15.4

# no names before, names after
bench::mark(set_names(x, names), iterations = 1000000)

# before
#> # A tibble: 1 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, names)   1.51µs   1.81µs   527374.        0B     18.5

# after
#> # A tibble: 1 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, names)    1.2µs   1.52µs   636811.        0B     17.2

# this doesn't make an altrep wrapper, but `names<-` does
attr(x, "names") <- names

# names before, no names after
bench::mark(set_names(x, NULL), iterations = 1000000)

# before
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, NULL)   1.32µs   1.61µs   598498.        0B     19.8

# after
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, NULL)   1.03µs   1.34µs   720275.        0B     18.0

# names before, names after
bench::mark(set_names(x, names), iterations = 1000000)

# before
#> # A tibble: 1 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, names)   1.59µs   1.92µs   501267.        0B     17.5

# after
#> # A tibble: 1 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, names)   1.33µs   1.65µs   584423.        0B     15.8

Comment on lines -64 to -66
int n_kept = 0;

r_obj* dots = KEEP_N(rlang_dots(env), &n_kept);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved these after the early exit for nm == NULL because they aren't needed, just for maximal performance and code clarity

Comment on lines 92 to 104
nm = KEEP_N(rlang_as_function(nm, env), &n_kept);
nm = KEEP_N(eval_fn_dots(nm, mold, dots, env), &n_kept);
} else {
if (r_length(dots) > 0) {
nm = KEEP_N(eval_fn_dots(c_fn, nm, dots, env), &n_kept);
}
} else if (r_length(dots) > 0) {
nm = KEEP_N(eval_fn_dots(c_fn, nm, dots, env), &n_kept);
}

if (r_typeof(nm) != R_TYPE_character || r_is_object(nm)) {
nm = KEEP_N(eval_as_character(nm, env), &n_kept);

if (r_typeof(nm) != R_TYPE_character) {
r_abort("`nm` must be `NULL` or a character vector.");
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked this section a little

  • The nm == function case didn't call as.character() on the result of the function, but I felt like it was reasonable to do so since we call as.character() on the result of set_names(x, 1, 2) when ... are combined with c(), and on set_names(x, 1) when nm just isn't a character vector.

  • We also had some redundant "if not character, then error" checks below that I've removed in favor of just this single character check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant