by causes column names to be repeated #5329

rlderi · 2022-02-11T14:24:14Z

When aggregating, if no by is specified then the output columns have the function names prepended ("mean", "median"). This is helpful.

> dt<-as.data.table(mtcars)
> dt[, c(mean = lapply(.SD, mean), median = lapply(.SD, median)), .SDcols = c("mpg", "cyl")]
   mean.mpg mean.cyl median.mpg median.cyl
1: 20.09062   6.1875       19.2          6

However, if a by is specified then the column names are simply repeated rather than prepended with the function names. It would seem preferable to keep prepending the names regardless of whether there is a by.

> dt[, c(mean = lapply(.SD, mean), median = lapply(.SD, median)), .SDcols = c("mpg", "cyl"), by = "am"]
   am      mpg      cyl  mpg cyl
1:  1 24.39231 5.076923 22.8   4
2:  0 17.14737 6.947368 17.3   8

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.2

The text was updated successfully, but these errors were encountered:

avimallu · 2022-02-11T23:42:18Z

I think #4883 should solve your issues when it is merged. Probably react on that merge request if it's a priority for you.

If you absolutely need it now, there's a bit of an esoteric approach that's not particularly pleasant to the eye, in #4970, quoted below

DT[, unlist(lapply(.SD, function(x) c(max=max(x), min=min(x)))), by=group]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

by causes column names to be repeated #5329

by causes column names to be repeated #5329

rlderi commented Feb 11, 2022

avimallu commented Feb 11, 2022 •

edited

Loading

by causes column names to be repeated #5329

by causes column names to be repeated #5329

Comments

rlderi commented Feb 11, 2022

avimallu commented Feb 11, 2022 • edited Loading

avimallu commented Feb 11, 2022 •

edited

Loading