You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the i index contains a mix of T and F, surprisingly there are no less rows then the original data.table:
data.table(x = 1:2)[c(F, T), list(x, y = 3:4)] # 2 rows returned
## x y
## 1: 2 3
## 2: 2 4
I would have expected
## x y
## 1: 2 4
This is at odds with data.frame intuition:
data.frame(x = 1:2)[c(F, T), c("x", "x")] # 1 row returned
## x x.1
## 2 2 2
Edge-cases
Moreover, the case where i = F does not even yield a valid result, which makes this an annoying edge-case to deal with:
data.table(x = 1:2)[c(F, F), list(x, y = 3:4)]
## Error in if (mn%%n[i] != 0) warning("Item ", i, " is of size ", n[i], :
## missing value where TRUE/FALSE needed
I would have expected
## Empty data.table (0 rows) of 2 cols: x,y
The other edge case where all i = T does work as expected:
data.table(x = 1:2)[c(T, T), list(x, y = 3:4)]
## x y
## 1: 1 3
## 2: 2 4
Is there any explanation behind this behaviour?
The text was updated successfully, but these errors were encountered:
Your data.frame equivalent is adding columns, not rows, or rather a new column with identical number of rows. They're not equivalent operations. You can use transform to get the approx. equivalent operation, which is more or less identical in behaviour:
transform(data.frame(x=1:2)[c(F,T), , drop=FALSE], y=3:4)
# x y# 1 2 3# 2 2 4# Warning message:# In data.frame(list(x = 2L), y = 3:4) :# row names were found from a short variable and have been discarded
transform(data.frame(x=1:2)[c(F,F), ], y=3:4)
# Error in data.frame(list(X_data = integer(0)), y = 3:4) : # arguments imply differing number of rows: 0, 2
The order of operations DT[i, j]. It first evaluates i, and then j. Not the other way around. So, in the first case, after the row subset using c(FALSE,TRUE), it's left with:
# x# 1: 2
And then, use use list(x, y=3:4), where, the shorter column is automatically recycled to fit the longest column's length.
For the same reason, in the second case, after the subset, x is of length 0 = integer(0), and therefore could be recycled to fit the length of 2, with the value NA. But this resulted in an error because of an invalid condition check. I'll fix this (after checking in with Matt).
data.table always tries to recycle columns automatically to fit the longest column, and warns if the recycling leaves a reminder.
When the
i
index contains a mix ofT
andF
, surprisingly there are no less rows then the originaldata.table
:I would have expected
This is at odds with
data.frame
intuition:Edge-cases
Moreover, the case where
i = F
does not even yield a valid result, which makes this an annoying edge-case to deal with:I would have expected
The other edge case where all
i = T
does work as expected:Is there any explanation behind this behaviour?
The text was updated successfully, but these errors were encountered: