-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data.table new column := slower than base R (?) #921
Comments
@szilard thanks! R3.1 doesn't update in-place. It shallow copies other columns and adds the new column.. As to why this happens in data.table, just ran a |
Thanks @arunsrinivasan! Yes, I meant base R >=3.1 is not copying the existing columns. So I wonder if data.table can be made faster. First, I increase a bit the data size:
I get now ~ 2 sec for base R and ~ 5 sec for data.table. Then I clean up (
and get this:
Any potential sources for speed-up? |
I've already linked to the other issue, where I've listed what I think is the cause - there's an unwanted |
Awesome, thanks (I've seen the other issue, just was not obvious for me it's an easy fix). Also wondering about the speedup, but I guess ~ 2 sec in this case (see profile above). What about the (With those 2 changes now it would be on par with base R ~ 2 sec total.) |
Right. Don't really understand why there's a |
I think the |
Thanks a lot. Will take a look. |
I can also confirm a lot of performance impact by migrating to 1.9.4. Is there a proposed timeframe for this bugfix? I suppose I could revert back to 1.9.2 for the time being. |
@alexcpsec performance impact with respect to just |
It is a very convoluted function I have that makes heavy use of I could try to Rprofile the whole thing with both 1.9.2 and 1.9.4 versions so I can give you a better idea of how the versions compare to each other in a larger piece of code. Would this be useful for you? |
@alexcpsec, yes, that would be incredibly useful, in the absence of a reproducible example. Thanks. |
Now I get this: # clean session
library(data.table)
dt = data.table(x = runif(100e6))
system.time(dt[,y := 2*x])
# user system elapsed
# 0.384 0.563 0.956
df = data.frame(x = runif(100e6))
system.time( df$y <- 2*df$x )
# user system elapsed
# 0.376 0.554 0.933 The timings are more or less the same, and we can't avoid a copy here. But there are cases where we can delay the copy like R v3.1.0+ does, by shallow copying. That'll be taken care of in #617. |
@arunsrinivasan Awesome!! Great fix. |
While data.table new column := used to be >100x faster than base R, base R (>=3.1) updates now data.frames in place and caught up. I wonder why data.table is slower for example in this case:
I get:
base R: 0.272 0.300 0.572 (user system elapsed)
data.table: 0.696 0.744 1.444
R 3.1.1 data.table 1.9.4.
The text was updated successfully, but these errors were encountered: