Inconsistent logical indexing behaviour #758

mchen402 · 2014-08-06T13:45:07Z

When the i index contains a mix of T and F, surprisingly there are no less rows then the original data.table:

data.table(x = 1:2)[c(F, T), list(x, y = 3:4)]   # 2 rows returned
##    x y
## 1: 2 3
## 2: 2 4

I would have expected

##    x y
## 1: 2 4

This is at odds with data.frame intuition:

data.frame(x = 1:2)[c(F, T), c("x", "x")]  # 1 row returned
##   x x.1
## 2 2   2

Edge-cases

Moreover, the case where i = F does not even yield a valid result, which makes this an annoying edge-case to deal with:

data.table(x = 1:2)[c(F, F), list(x, y = 3:4)]
## Error in if (mn%%n[i] != 0) warning("Item ", i, " is of size ", n[i],  : 
##   missing value where TRUE/FALSE needed

I would have expected

## Empty data.table (0 rows) of 2 cols: x,y

The other edge case where all i = T does work as expected:

data.table(x = 1:2)[c(T, T), list(x, y = 3:4)]
##    x y
## 1: 1 3
## 2: 2 4

Is there any explanation behind this behaviour?

The text was updated successfully, but these errors were encountered:

arunsrinivasan · 2014-08-06T13:50:10Z

Your data.frame equivalent is adding columns, not rows, or rather a new column with identical number of rows. They're not equivalent operations. You can use transform to get the approx. equivalent operation, which is more or less identical in behaviour:

transform(data.frame(x=1:2)[c(F,T), , drop=FALSE], y=3:4)
#   x y
# 1 2 3
# 2 2 4
# Warning message:
# In data.frame(list(x = 2L), y = 3:4) :
#   row names were found from a short variable and have been discarded

transform(data.frame(x=1:2)[c(F,F), ], y=3:4)
# Error in data.frame(list(X_data = integer(0)), y = 3:4) : 
#   arguments imply differing number of rows: 0, 2

arunsrinivasan · 2014-08-06T20:10:09Z

@tunaaa,

There are two things here:

The order of operations DT[i, j]. It first evaluates i, and then j. Not the other way around. So, in the first case, after the row subset using c(FALSE,TRUE), it's left with:

#    x
# 1: 2

And then, use use list(x, y=3:4), where, the shorter column is automatically recycled to fit the longest column's length.

For the same reason, in the second case, after the subset, x is of length 0 = integer(0), and therefore could be recycled to fit the length of 2, with the value NA. But this resulted in an error because of an invalid condition check. I'll fix this (after checking in with Matt).

data.table always tries to recycle columns automatically to fit the longest column, and warns if the recycling leaves a reminder.

arunsrinivasan · 2014-09-25T17:59:07Z

To fix - only the case:

data.table(x = 1:2)[c(F, F), list(x, y = 3:4)]

which should result in

## Empty data.table (0 rows) of 2 cols: x,y

arunsrinivasan added bug Low labels Sep 25, 2014

arunsrinivasan self-assigned this Sep 25, 2014

arunsrinivasan added this to the v1.9.6 milestone Sep 25, 2014

arunsrinivasan added Medium and removed Low labels Sep 25, 2014

arunsrinivasan closed this as completed in 44ac69b Oct 2, 2014

dpastoor mentioned this issue Feb 7, 2015

fread fails if whitespace before first character #1035

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent logical indexing behaviour #758

Inconsistent logical indexing behaviour #758

mchen402 commented Aug 6, 2014

arunsrinivasan commented Aug 6, 2014

arunsrinivasan commented Aug 6, 2014

arunsrinivasan commented Sep 25, 2014

Inconsistent logical indexing behaviour #758

Inconsistent logical indexing behaviour #758

Comments

mchen402 commented Aug 6, 2014

Edge-cases

arunsrinivasan commented Aug 6, 2014

arunsrinivasan commented Aug 6, 2014

arunsrinivasan commented Sep 25, 2014