You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
another example where making grouping column length 1 might cause problems, spotted by @st-pasha V2 is all NA because shift(A*2) operates on scalar A, and shift on scalar will give NA.
DT= data.table(A=c(1, 2, 1, 1, 2), B=3:7)
DT[, .(A*2, shift(A*2), B*2, shift(B*2)), by=A]
# A V1 V2 V3 V4# <num> <num> <num> <num> <num>#1: 1 2 NA 6 NA#2: 1 2 NA 10 6#3: 1 2 NA 12 10#4: 2 4 NA 8 NA#5: 2 4 NA 14 8
While feature of having grouping columns length 1 within a group is useful, it comes at the cost of consistency. If we imagine a shiny app where user just chose columns to aggregate and groupby, then it is not difficult to reach such cases. I think it would be useful to optionally provide grouping column of length of the group.
The data.table has a few inconsistencies when grouping and aggregating on the same variables. This can cause a pivot table to have inconsistencies e.g. pivot table cells with the wrong values that is most obvious when row or columns totals that don’t equal the sum of the values in the row or column:
Possible solution, a significant breaking change, would be to provide scalar values inside .BY but leave the full (repeated) values inside regular variables in j.
Reported by @geofflazzarini in #3103 (comment)
There should be no difference in
TotalA
column in resultsThis might be tricky because of "Inside each group, why are the group variables length-1?" in FAQ.
The text was updated successfully, but these errors were encountered: