Skip to content

Commit

Permalink
Closes #1470 -- streamline loop in GForce j optimization
Browse files Browse the repository at this point in the history
  • Loading branch information
Michael Chirico committed Aug 19, 2019
1 parent a8e926a commit 08b4b44
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 9 deletions.
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,8 @@
19. Subsetting does a better job of catching a malformed `data.table` with error rather than segfault. A column may not be NULL, nor may a column be an object which has columns (such as a `data.frame` or `matrix`). Thanks to a comment and reproducible example in [#3369](https://github.com/Rdatatable/data.table/issues/3369) from Drew Abbot which demonstrated the issue which arose from parsing JSON. The next release will enable `as.data.table` to unpack columns which are `data.frame` to support this use case.
20. `data.table` optimization much more efficient for many as-is columns in `j`, [#1470](https://github.com/Rdatatable/data.table/issues/1470). Thanks @Jorges1000 for the report.
#### NOTES
1. When upgrading to 1.12.0 some Windows users might have seen `CdllVersion not found` in some circumstances. We found a way to catch that so the [helpful message](https://twitter.com/MattDowle/status/1084528873549705217) now occurs for those upgrading from versions prior to 1.12.0 too, as well as those upgrading from 1.12.0 to a later version. See item 1 in notes section of 1.12.0 below for more background.
Expand Down
18 changes: 9 additions & 9 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -1461,6 +1461,7 @@ replace_order = function(isub, verbose, env) {
lockBinding(".GRP",SDenv)
lockBinding(".iSD",SDenv)

#browser()
GForce = FALSE
if ( getOption("datatable.optimize")>=1 && (is.call(jsub) || (is.name(jsub) && as.character(jsub)[[1L]] %chin% c(".SD",".N"))) ) { # Ability to turn off if problems or to benchmark the benefit
# Optimization to reduce overhead of calling lapply over and over for each group
Expand Down Expand Up @@ -1612,9 +1613,9 @@ replace_order = function(isub, verbose, env) {
}
if (verbose) {
if (!identical(oldjsub, jsub))
cat("lapply optimization changed j from '",deparse(oldjsub),"' to '",deparse(jsub,width.cutoff=200L),"'\n",sep="")
cat("lapply optimization changed j from '",deparse(oldjsub),"' to '",deparse(jsub,width.cutoff=200L, nlines=1L),"'\n",sep="")
else
cat("lapply optimization is on, j unchanged as '",deparse(jsub,width.cutoff=200L),"'\n",sep="")
cat("lapply optimization is on, j unchanged as '",deparse(jsub,width.cutoff=200L, nlines=1L),"'\n",sep="")
}
dotN = function(x) is.name(x) && x == ".N" # For #5760
# FR #971, GForce kicks in on all subsets, no joins yet. Although joins could work with
Expand Down Expand Up @@ -1644,7 +1645,7 @@ replace_order = function(isub, verbose, env) {
}
if (jsub[[1L]]=="list") {
GForce = TRUE
for (ii in seq_along(jsub)[-1L]) if (!.ok(jsub[[ii]])) GForce = FALSE
for (ii in seq_along(jsub)[-1L]) if (!.ok(jsub[[ii]])) {GForce = FALSE; break}
} else GForce = .ok(jsub)
if (GForce) {
if (jsub[[1L]]=="list")
Expand All @@ -1666,13 +1667,12 @@ replace_order = function(isub, verbose, env) {
nomeanopt=FALSE # to be set by .optmean() using <<- inside it
oldjsub = jsub
if (jsub[[1L]]=="list") {
for (ii in seq_along(jsub)[-1L]) {
this_jsub = jsub[[ii]]
if (dotN(this_jsub)) next; # For #5760
# Addressing #1369, #2949 and #1974. Added is.symbol() check to handle cases where expanded function definition is used instead of function names. #1369 results in (function(x) sum(x)) as jsub[[.]] from dcast.data.table.
# Addressing #1369, #2949 and #1974. Added is.symbol() check to handle cases where expanded function definition is used instead of function names. #1369 results in (function(x) sum(x)) as jsub[[.]] from dcast.data.table.
# earlier, did a for loop and replaced only as necessary but this was slow with huge (e.g. 30K elements) j, #1470
jsub = as.call(c(quote(list), lapply(jsub[-1L], function(this_jsub) {
if (is.call(this_jsub) && is.symbol(this_jsub[[1L]]) && this_jsub[[1L]]=="mean")
jsub[[ii]] = .optmean(this_jsub)
}
.optmean(this_jsub) else this_jsub
})))
} else if (jsub[[1L]]=="mean") {
jsub = .optmean(jsub)
}
Expand Down

0 comments on commit 08b4b44

Please sign in to comment.