You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When setkey is called on a data.table with columns that have identical names, and then those names are updated, the keys appear not to update.
That means that if you want to do a cross-join of row-IDs in a dataset, and then update the CJ with additional attributes from the original data, you have to (a) update the key, (b) do CJ(..., sorted=F), or (c) use base::merge.data.frame() to get the merge to work (MWE 2)
MWE 1 is a silly example to show the key/name issue. I think that it might be what drives the errors in MWE 2, which is based on the issue I ran into today.
I'm running data.table version 1.13.6
MWE 1
library(data.table)
jnk<- data.table(x=1:3, x=4:6)
setkey(jnk, x, x)
setnames(jnk, c("y","z"))
all(key(jnk) %in% c("y","z")) # key(jnk) = c("z", "x")... but there is no "x" anymore
MWE 2
library(data.table)
nobs=4dat= data.table(id=1:nobs, x= runif(nobs))
cj_sort<- with(dat, CJ(id, id, sorted=T)) # don't do fixes on this onecj_srt2<- with(dat, CJ(id, id, sorted=T)) # workscj_unst<- with(dat, CJ(id, id, sorted=F)) # works b/c we update keys?# set colnames to be unique
setnames(cj_sort, c("id_1", "id_2"))
setnames(cj_unst, c("id_1", "id_2"))
setnames(cj_srt2, c("id_1", "id_2"))
# fixes the issue
setkey(cj_unst, id_1, id_2) # key unsorted data to fix
setkey(cj_srt2, id_1, id_2) # re-key sorted data to fix
stopifnot(key(cj_sort) == c("id_1", "id_2")) # broken, keys are c("id_2","id")
stopifnot(key(cj_unst) == c("id_1", "id_2")) # ok
stopifnot(key(cj_srt2) == c("id_1", "id_2")) # ok
stopifnot(cj_sort[i=dat, on= .(id_1=id), .N] ==nobs^2) # ok
stopifnot(cj_sort[i=dat, on= .(id_2=id), .N] ==nobs^2) # broken - won't merge to nobs^2 rows
stopifnot(cj_unst[i=dat, on= .(id_1=id), .N] ==nobs^2) # ok
stopifnot(cj_unst[i=dat, on= .(id_2=id), .N] ==nobs^2) # ok - works b/c of setkey?
stopifnot(cj_srt2[i=dat, on= .(id_1=id), .N] ==nobs^2) # ok
stopifnot(cj_srt2[i=dat, on= .(id_2=id), .N] ==nobs^2) # ok - works b/c of setkey() workaround?# data.table::merge
stopifnot(nrow(merge.data.table(cj_sort, dat, by.x="id_2", by.y="id")) ==nobs^2) # broken
stopifnot(nrow(merge.data.table(cj_unst, dat, by.x="id_2", by.y="id")) ==nobs^2) # ok
stopifnot(nrow(merge.data.table(cj_srt2, dat, by.x="id_2", by.y="id")) ==nobs^2) # ok# base::merge works, even though data.table::merge doesn't
stopifnot(nrow(merge.data.frame(cj_sort, dat, by.x="id_2", by.y="id")) ==nobs^2) # workaround: use base::merge
When
setkey
is called on a data.table with columns that have identical names, and then those names are updated, the keys appear not to update.That means that if you want to do a cross-join of row-IDs in a dataset, and then update the CJ with additional attributes from the original data, you have to (a) update the key, (b) do
CJ(..., sorted=F)
, or (c) usebase::merge.data.frame()
to get the merge to work (MWE 2)MWE 1 is a silly example to show the key/name issue. I think that it might be what drives the errors in MWE 2, which is based on the issue I ran into today.
I'm running
data.table
version1.13.6
MWE 1
MWE 2
Output of
sessionInfo()
The text was updated successfully, but these errors were encountered: