Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base R generic 'sort_by' for data.tables #6662

Open
rikivillalba opened this issue Dec 13, 2024 · 1 comment
Open

Base R generic 'sort_by' for data.tables #6662

rikivillalba opened this issue Dec 13, 2024 · 1 comment
Milestone

Comments

@rikivillalba
Copy link
Contributor

I wonder whether it is worth to implement sort_by.data.table (R 4.4.0) with internal forder instead of base order, and possibly an 'in_place' argument

Some improvement can be seen in my little test:

library(data.table)
set.seed(1234)
DT <- data.table(x=runif(1e7),y=runif(1e7))

# sort_by.data.frame with "forder" instead of "order"
sort_by2 <- function (x, y, ...) 
{
  if (inherits(y, "formula")) 
    y <- .formula2varlist(y, x)
  if (!is.list(y)) 
    y <- list(y)
  o <- do.call(data.table:::forder, c(unname(y), list(...)))
  x[o, , drop = FALSE]
}

microbenchmark::microbenchmark(times = 1, 
  sort_by2(DT, ~ x + y),
  sort_by(DT, ~ x + y)
)
#> Unit: seconds
#>                  expr      min       lq     mean   median       uq      max
#>  sort_by2(DT, ~x + y) 1.068101 1.068101 1.068101 1.068101 1.068101 1.068101
#>   sort_by(DT, ~x + y) 2.317193 2.317193 2.317193 2.317193 2.317193 2.317193

Created on 2024-12-12 with reprex v2.1.1

@rikivillalba rikivillalba changed the title base R generic sort_by for data.tables base R generic 'sort_by' for data.tables Dec 13, 2024
@rikivillalba rikivillalba changed the title base R generic 'sort_by' for data.tables Base R generic 'sort_by' for data.tables Dec 13, 2024
@MichaelChirico
Copy link
Member

Good idea. I wouldn't offer in_place (you can use setorder() for that, with a slightly different API).

It also looks like .formula2varlist() is key for allowing much more complicated formulas to WAI:

mtcars |> sort_by(~I(am + mpg))

It will be a lot harder to try and translate generic formulas into, e.g., a setorderv() call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants