-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong totals when aggregating and grouping by same column? #3103
Comments
Inside
For a rationale, see the question "Inside each group, why are the group variables length-1?" inside the FAQ at |
Thank you @franknarf1 and @jangorecki for the reply and pointer to the FAQ. Examples:
Result of the above: TotalA=1
Result of the above: TotalB=3
No result, fails to execute with error: |
The last one is a bug... |
Not sure if this is a nuance of data.table's grouping/aggregation method but when grouping and aggregating by a single variable data.table does not 'factorise' the grouping call. i.e. It counts each number as it's own group after the aggregation, so in your case you're left with only 3 Quick and easy fix is to ensure factorisation takes place within the initial grouping call.
|
Hello. I am confused by the behaviour of data.table when aggregating and grouping on the same column. It seems to perform the aggregate (e.g. sum) on the grouped data, rather than the ungrouped data. I am not necessarily saying this is wrong - but it is different to other tools and I was wondering what the explanation is or whether I am doing something wrong (or if possibly this is a bug). I've included a comparison to dplyr, which performs more like I would expect (and more like SQL). NB: I've tried searching the issues, stackoverflow, etc, as requested, but the nature of this scenario (grouping and aggregating the same column) is a bit unique and I've not found any matches.
#
Minimal reproducible example
Please compare the Total column in the two examples below. E.g. there are three rows with the value three, so I would expect the Total to be 9, not 3.
data.table
Result (r):
dplyr
Result (r):
#
Output of sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.7.6 data.table_1.11.8 openxlsx_4.1.0 bindrcpp_0.2.2 pivottabler_0.4.0.9000
loaded via a namespace (and not attached):
[1] Rcpp_0.12.19 rstudioapi_0.8 bindr_0.1.1 magrittr_1.5 tidyselect_0.2.4 R6_2.3.0 rlang_0.2.2 fansi_0.3.0 tools_3.5.1
[10] utf8_1.1.4 cli_1.0.1 htmltools_0.3.6 yaml_2.2.0 assertthat_0.2.0 digest_0.6.17 tibble_1.4.2 crayon_1.3.4 zip_1.0.0
[19] purrr_0.2.5 htmlwidgets_1.3 glue_1.3.0 compiler_3.5.1 pillar_1.3.0 jsonlite_1.5 pkgconfig_2.0.2
The text was updated successfully, but these errors were encountered: