-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further optimisation of .SD
in j
#735
Comments
.SD[1], .SD[1L], head(.SD, 1) in `j` alone or along with c(..) are now optimised for speed internally.
Fixed #861. |
Some
|
A further idea: I ran into this on SO:
Hm, just noticed that the "lapply optimization" strips my |
In #370
.SD
was optimised internally for cases like:You can see that it's optimised by turning verbose on:
However, this expression is not always optimised. For example,
This is because
.SD
cases are a little trickier to optimise. To begin with, if.SD
hasj
as well, then it can't be optimised:The above expression can not be changed to
list(..)
(in my understanding).And even when there's no
j
,.SD
can havei
arguments of typeinteger
,numeric
,logical
,expressions
and evendata.tables
. For example:If we optimise this as such, it'd turn to:
which is not really efficient as it evaulates the expression (vector scan) as many times as there are columns, which would be quite slow when there are more and more columns. A better way to do it would be:
which is a little tricky to implement.
If it's a
join
oni
, then it must not be optimised as well, etc..Basically,
.SD
and.SD[...]
should be optimised one-by-one, optimising for each scenario:Optimise (for possible cases):
.SD
DT[, c(.SD, lapply(.SD, ...)), by=.]
DT[, c(.SD[1], lapply(.SD, ...)), by=.]
.SD[1L]
# no j.SD[1]
.SD[logical]
.SD[a]
# wherea
is integer.SD[a]
# wherea
is numeric,
. Ex:.SD[1,]
.SD[x > 1 & y > 9]
.SD[data.table]
# shouldn't / can't be optimised, IMO.SD[character]
# shouldn't / can't be optimised, IMO.SD[eval(.)]
# might be possible in some cases.SD[i, j]
# shouldn't / can't be optimised, IMODT[, c(list(.), lapply(.SD, ...)), by=.]
All of these throws error at the moment:
DT[, c(data.table(.), lapply(.SD, ...)), by=.]
DT[, c(as.data.table(.), lapply(.SD, ...)), by=.]
DT[, c(data.frame(.), lapply(.SD, ...)), by=.]
DT[, c(as.data.frame(.), lapply(.SD, ...)), by=.]
Note that all these can occur on the right side of
lapply(.SD, ...)
as well.The text was updated successfully, but these errors were encountered: