I didn't think get() copies, why is fun2() slowest here? #727

mattdowle · 2014-07-10T18:49:18Z

http://stackoverflow.com/a/24668479/403310

arunsrinivasan · 2014-07-15T17:54:07Z

I'm almost sure it's because of this part in [.data.table:

if (is.null(irows)) {
    for (s in seq_along(xcols)) {  # xcols means non-join x columns, since join columns come from i
        target = xcolsAns[s]
        source = xcols[s]
        ans[[target]] = x[[source]]
        if (address(ans[[target]]) == address(x[[source]])) ans[[target]] = copy(ans[[target]])
    }
} else {
    for (s in seq_along(xcols)) {
        target = xcolsAns[s]
        source = xcols[s]
        ans[[target]] = .Call(CsubsetVector,x[[source]],irows)   # i.e. x[[source]][irows], but guaranteed new memory even for singleton logicals from R 3.1
    }
}

get() fetches all columns and here is.null(irows) == TRUE and therefore runs the for-loop under the if-clause looping through each one of columns, copying each time, before assigning ans to .SD later on.

MichaelChirico · 2019-09-02T17:22:13Z

I think that without .SDcols get has to copy. So I'm not sure there's any fix to this besides supplying .SDcols... OK to close?

jangorecki · 2020-03-22T14:10:29Z

I think that solution proposed #4304 is a proper way to address this issue. It removes the burden of maintaining get optimisation. I think it is is reasonable to advise to use substitution rather than get, thus closing. SO should be updated when we will have it in master.

mattdowle added question labels Jul 10, 2014

arunsrinivasan mentioned this issue Oct 28, 2014

data.table new column := slower than base R (?) #921

Closed

jangorecki closed this as completed Mar 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I didn't think get() copies, why is fun2() slowest here? #727

I didn't think get() copies, why is fun2() slowest here? #727

mattdowle commented Jul 10, 2014

arunsrinivasan commented Jul 15, 2014

MichaelChirico commented Sep 2, 2019

jangorecki commented Mar 22, 2020

I didn't think get() copies, why is fun2() slowest here? #727

I didn't think get() copies, why is fun2() slowest here? #727

Comments

mattdowle commented Jul 10, 2014

arunsrinivasan commented Jul 15, 2014

MichaelChirico commented Sep 2, 2019

jangorecki commented Mar 22, 2020