Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

by causes column names to be repeated #5329

Open
rlderi opened this issue Feb 11, 2022 · 1 comment
Open

by causes column names to be repeated #5329

rlderi opened this issue Feb 11, 2022 · 1 comment

Comments

@rlderi
Copy link

rlderi commented Feb 11, 2022

When aggregating, if no by is specified then the output columns have the function names prepended ("mean", "median"). This is helpful.

> dt<-as.data.table(mtcars)
> dt[, c(mean = lapply(.SD, mean), median = lapply(.SD, median)), .SDcols = c("mpg", "cyl")]
   mean.mpg mean.cyl median.mpg median.cyl
1: 20.09062   6.1875       19.2          6

However, if a by is specified then the column names are simply repeated rather than prepended with the function names. It would seem preferable to keep prepending the names regardless of whether there is a by.

> dt[, c(mean = lapply(.SD, mean), median = lapply(.SD, median)), .SDcols = c("mpg", "cyl"), by = "am"]
   am      mpg      cyl  mpg cyl
1:  1 24.39231 5.076923 22.8   4
2:  0 17.14737 6.947368 17.3   8

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.2
@avimallu
Copy link
Contributor

avimallu commented Feb 11, 2022

I think #4883 should solve your issues when it is merged. Probably react on that merge request if it's a priority for you.

If you absolutely need it now, there's a bit of an esoteric approach that's not particularly pleasant to the eye, in #4970, quoted below

DT[, unlist(lapply(.SD, function(x) c(max=max(x), min=min(x)))), by=group]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants