Skip to content

Commit

Permalink
Closes #3079, #1876, #3533 -- catch unsupported GForce/fastmean dispatch
Browse files Browse the repository at this point in the history
  • Loading branch information
Michael Chirico committed May 6, 2019
1 parent 81ecfd0 commit 8f48aac
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 5 deletions.
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,8 @@
5. `fread()` could crash if `quote=""` (i.e. ignore quotes), the last line is too short, and `fill=TRUE`, [#3524](https://github.com/Rdatatable/data.table/pull/3524). Thanks to Jiucang Hao for the report and reproducible example.
6. `data.table` grouping optimization (`GForce` & `fastmean`) ignored method dispatch and could give incorrect results for non-simple classes, [#3533](https://github.com/Rdatatable/data.table/issues/3533), [#1876](https://github.com/Rdatatable/data.table/issues/1876), and [#3079](https://github.com/Rdatatable/data.table/issues/3079). Thanks to @d-sci, @rossholmberg, and @Henrik-P, respectively, for reporting.
#### NOTES
1. `rbindlist`'s `use.names="check"` now emits its message for automatic column names (`"V[0-9]+"`) too, [#3484](https://github.com/Rdatatable/data.table/pull/3484). See news item 5 of v1.12.2 below.
Expand Down
19 changes: 14 additions & 5 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -1764,13 +1764,24 @@ replace_dot_alias <- function(e) {
is.call(q) && is.symbol(q[[1L]]) && q[[1L]] == "mean" && is_strictly_numeric(eval(q[[2L]], x, parent.frame()))
}
if (jsub[[1L]]=="list") {
fastmean_assigned = FALSE
for (ii in seq_along(jsub)[-1L]) {
this_jsub = jsub[[ii]]
if (dotN(this_jsub)) next; # For #5760
# Addressing #1369, #2949 and #1974. Added is.symbol() check to handle cases where expanded function definition is used insead of function names. #1369 results in (function(x) sum(x)) as jsub[[.]] from dcast.data.table.
if (.fastmean_ok(jsub)) jsub[[ii]] = .optmean(this_jsub)
if (.fastmean_ok(jsub)) {
jsub[[ii]] = .optmean(this_jsub)
if (!fastmean_assigned) {
assign("Cfastmean", Cfastmean, SDenv)
assign("mean", base::mean.default, SDenv)
}
}
}
} else if (.fastmean_ok(jsub)) jsub = .optmean(jsub)
} else if (.fastmean_ok(jsub)) {
jsub = .optmean(jsub)
assign("Cfastmean", Cfastmean, SDenv)
assign("mean", base::mean.default, SDenv)
}
if (nomeanopt) {
warning("Unable to optimize call to mean() and could be very slow. You must name 'na.rm' like that otherwise if you do mean(x,TRUE) the TRUE is taken to mean 'trim' which is the 2nd argument of mean. 'trim' is not yet optimized.",immediate.=TRUE)
}
Expand All @@ -1780,8 +1791,6 @@ replace_dot_alias <- function(e) {
else
cat("Old mean optimization is on, left j unchanged.\n")
}
assign("Cfastmean", Cfastmean, SDenv)
assign("mean", base::mean.default, SDenv)
# Old comments still here for now ...
# Here in case nomeanopt=TRUE or some calls to mean weren't detected somehow. Better but still slow.
# Maybe change to :
Expand Down Expand Up @@ -1892,7 +1901,7 @@ replace_dot_alias <- function(e) {

# variable is numeric & won't attempt to dispatch to non-standard gfuns element
# prevents errors like #3533, #1876, #3079
is_strictly_numeric = function(x) class(x)[1L] %chin% c('integer', 'numeric')
is_strictly_numeric = function(x) class(x)[1L] %chin% c('integer', 'integer64', 'numeric')

.optmean <- function(expr) { # called by optimization of j inside [.data.table only. Outside for a small speed advantage.
if (length(expr)==2L) # no parameters passed to mean, so defaults of trim=0 and na.rm=FALSE
Expand Down
11 changes: 11 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -14370,6 +14370,17 @@ if (test_yaml) { # csvy; #1701
DT_yaml, warning = 'Combining a search.*YAML.*')
}

# #3533, #1876, #3079 -- ensure GForce/fastmean are not applied incorrectly under method dispatch
DT = data.table(ID = 1:2, v = 3:4, d = .Date(1:2))
class(DT$v) = 'mymean'
mean.mymean = function(x) min(x) + 1
test(2033.1, DT[ , mean(v)], 4)
options(datatable.optimize = 0)
trueDT = data.table(ID = 1:2, V1 = c(4, 5))
test(2033.2, DT[ , mean(v), by = ID], trueDT)
options(datatable.optimize = 2)
test(2033.3, DT[ , mean(v), by = ID, verbose = TRUE], trueDT, output = 'GForce FALSE')


###################################
# Add new tests above this line #
Expand Down

0 comments on commit 8f48aac

Please sign in to comment.