You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One feature of data.table that I find myself using quite often is the .SD[1], or .SD[.N]
functionality. It can be quite useful (especially in tandem with the .SDcols paramter). However, I notice that it is one of the slower operations I use in data.table. Here is an example:
Build some fake data
require(data.table) #data.table_1.9.5
set.seed(1)
data <- matrix(rnorm(50000000),ncol=5)
data <- as.data.table(data)
Create ID
data[ , ID := sample(1:2000000, nrow(data), replace=T)]
I much prefer method number 1 (the slow way) for documentation purposes, but it can really slow down my workflow, so I usually use method number 2. I understand this might be unavoidable since .SD[1] offers more flexibility than filtering on the first row. Does anyone else run into this situation?
sessionInfo()
#R version 3.0.1 (2013-05-16)
#Platform: x86_64-redhat-linux-gnu (64-bit)
#
# locale:
# [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
# [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C
# [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#
#attached base packages:
#[1] stats graphics grDevices utils datasets methods base
#
#other attached packages:
#[1] data.table_1.9.5
#
#loaded via a namespace (and not attached):
#[1] chron_2.3-44 tools_3.0.1
The text was updated successfully, but these errors were encountered:
Why delete? I was pointing more or less to "What should the report contain?" here - familiarise yourself with writing code blocks using github flavoured markdown. Comment the lines that should be commented etc..
Thanks @mgahan. There have been some internal optimisations of .SD. But seems like those can be further improved by using .I. Have linked to #735. Will close this one (as a close duplicate).
One feature of
data.table
that I find myself using quite often is the.SD[1]
, or.SD[.N]
functionality. It can be quite useful (especially in tandem with the
.SDcols
paramter). However, I notice that it is one of the slower operations I use indata.table
. Here is an example:Build some fake data
Create ID
Slow way
Faster way
Timings
I much prefer method number 1 (the slow way) for documentation purposes, but it can really slow down my workflow, so I usually use method number 2. I understand this might be unavoidable since .SD[1] offers more flexibility than filtering on the first row. Does anyone else run into this situation?
The text was updated successfully, but these errors were encountered: