Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent output of add column on join #1166

Closed
jangorecki opened this issue May 30, 2015 · 4 comments
Closed

inconsistent output of add column on join #1166

jangorecki opened this issue May 30, 2015 · 4 comments

Comments

@jangorecki
Copy link
Member

See the example and comment below.

library(data.table)
dt1 <- data.table(a = c(1,2), key="a")
dt2 <- data.table(b = 2, v = 1, key="b")
dt1[dt2, v := i.v]
#    a  v
#1: 1 NA
#2: 2  1

That looks OK.

library(data.table)
dt1 <- data.table(a = 1, key="a")
dt2 <- data.table(b = 2, v = 1, key="b")
dt1[dt2, v := i.v]
#    a
#1: 1

That looks not OK.
I would expect to get:

#    a  v
#1: 1 NA

Of course it should also retain data types, so NA_real_ in that case

R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] anchormodeling_0.3.8 testthat_0.10.0      devtools_1.8.0       R6_2.0.1             data.table_1.9.5    

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.6          magrittr_1.5         roxygen2_4.1.1       MASS_7.3-40          munsell_0.4.2        colorspace_1.2-6     stringr_1.0.0        plyr_1.8.2           tools_3.2.0          grid_3.2.0           gtable_0.1.2         git2r_0.10.1         rversions_1.0.0     
[14] digest_0.6.8         crayon_1.2.1         reshape2_1.4.1       ggplot2_1.0.1        microbenchmark_1.4-2 bitops_1.0-6         RCurl_1.95-4.6       memoise_0.2.1        stringi_0.4-1        scales_0.2.4         XML_3.98-1.1         chron_2.3-45         proto_0.3-10  
@franknarf1
Copy link
Contributor

Maybe a relevant excerpt from the news/changelog for 1.9.2:

"X[Y, col:=value] when no match exists in the join is now caught early and X is simply returned. Also a message when datatable.verbose is TRUE is provided. In addition, if col is an existing column, since no update actually takes place, the key is now retained." in response to #354

@arunsrinivasan
Copy link
Member

Possible duplicate of #820. A PR would be great for this. IIRC we've to not return when i evaluates to all FALSE and j has := (it's all R AFAICT).

@jangorecki
Copy link
Member Author

Another case, more problematic as cannot be easily fixed by if(!"x" %in% names(d1)) d1[, x := NA]. The column which we want to lookup from d2 already exists in d1 and we want to override it.

library(data.table)
d1 <- data.table(a="a",b=FALSE,x=1,key=c("a","b"))
d2 <- data.table(a="a",b=TRUE,x=5,key=c("a","b"))
d1[d2, x :=  i.x]
#    a     b x
# 1: a FALSE 1
## while I would expect to get
#    a     b  x
# 1: a FALSE NA

@jangorecki
Copy link
Member Author

Duplicate of #759, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants