-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Secondary key (key2, set2key) is interfering with subsetting (e.g. data_table[A == "a", A]) and table()
results
#1734
Comments
table()
results
Simplified code: library(data.table)
set.seed(2016)
data_table <- data.table(A = letters[sample(5,10000,replace = TRUE)],
B = letters[sample(5,10000,replace = TRUE)],
C = ifelse(runif(10000) < 0.05, NA, "ignore"))
setindex(data_table, A) # v1.9.7
data_table_naomit <- na.omit(data_table, cols = "C")
data_table_naomit[A == "a", .N, A]
# A N
#1: a 8
#2: d 13
#3: c 11
#4: b 9
#5: e 15
data_table_naomit[(A == "a"), .N, A] # force vector scan
# A N
#1: a 1855 |
Using latest HEAD (should be available in devel repo in around 15 minutes - once CI finish) library(data.table)
set.seed(2016)
data_table <- data.table(A = letters[sample(5,10000,replace = TRUE)],
B = letters[sample(5,10000,replace = TRUE)],
C = ifelse(runif(10000) < 0.05, NA, "ignore"))
setindex(data_table, A) # v1.9.7
data_table_naomit <- na.omit(data_table, cols = "C")
data_table_naomit[A == "a", .N, A]
# A N
#1: a 1855 |
Thank you @jangorecki for sorting this out and apologies for any inconvenience with my report and code examples. May I ask you what HEAD stands for? |
@m-dz no problem, just pointing out good practices. HEAD is latest change in git repository, master branch in this case. More in What is HEAD in Git? |
Thank you! |
Description
First, I am not sure if this is not a desired behaviour, but if, it is a bit surprising and not clearly explained.
Doing some data cleansing I have encountered a strange situation, where
base::table()
function returned a completely unexpected result when used on adata.table
subsetted withi
. After some (long) time I have tracked this issue down to the secondary keys - please see code example 1 below.It looks like removing missing values with
data.table::na.omit()
function preserves the secondary keys, whereas doing so by subsetting with!is.na()
sets the secondary one to aNULL
and preserves only the primary one and this interferes withbase::table()
results. I have tried to reproduce the same behaviour without the missing values part, please see code example 2 below, by manually setting primary and secondary keys, but this timebase::table()
results were as expected, so the problem is (probably) caused by something more hidden.Code example 1 (with the error)
Code example 2 (everything as expected)
If there is anything else I can provide please let me know.
The text was updated successfully, but these errors were encountered: