-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major performance drop of keyed := when index is present #4311
Comments
I ran your code through profvis. It shows that with the index you're de/allocating ~170 GB of memory as opposed to ~1.6 GB without one. Most of the time is spent in calls to Now what's going on? First, I haven't dug into it further to figure out the difference. But just to confirm, setindex(dt, NULL)
setattr(dt, "test", rep(1L, 5e6)) gives the same time and memory footprint. So the effect is not related to I'm not going to mess with shallow, but comments around that line indicate a vague plan to remove the copy. Maybe a remark in the relevant vignette or in the docs would be nice to have. And just in case the code wasn't just contrived to produce the effect dt[flag_dt, flag := 1L, on=c("symbol", "date>=start_date", "date<=end_date")] gets the job done in 2%-3% of the time needed when doing it your way without index either way. |
@tlapak thanks for digging into this. I am doing some bmerge rework, if you are going to touch only shallow, then fine, otherwise if also bmerge, then we will have a conflict to resolve :) |
@tlapak Thanks for digging into that. The non-equi join is the perfect way to do that. Some of my practical use cases involve more calculations on each row to get the |
I checked that #4440 resolves the speed issue on @renkun-ken example. And regression was already in 1.12.8, so not just current devel issue. |
… data.table before the regression that got reported in Rdatatable#4311 was introduced
The performance of
dt[selector, foo := bar]
on key could significantly drop when an index is present. Following is my use case and reproducible example:When
dt
has no index, the following code that repeatedly usingsymbol, date
selector to modifyflag
is fast enough.However, if an index is created intentionally, or in many cases unintentionally (auto index triggered by
dt[flag0 == 1, ...]
), the performance of the above code significantly decreases and could be unstable:I also tried explicitly writing
dt[selector, flag := 1L, on = .(symbol, date)]
, still no luck.Avoiding creating an index or disabling auto-index could avoid this problem but I'm still curious if there's something that significantly adds the overhead of keyed
:=
while there's an index.The text was updated successfully, but these errors were encountered: