-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge/inner join not symmetric #743
Comments
@arcosdium, thanks for the report. It'd make our job very much easier if we already have a minimal example that reproduces this issue, for example, like this one. Could you please edit your post with such an example? Thanks again. |
I managed to track to problem down. During the operations on KOut the key Column x2 changed the type from int to num. Using
I assume the 0 in Filter2 is not precisely 0, due to some rounding isses. By the way, great package I really need the speed and the memory efficiency. |
Great! Thanks for the example. My hunch is that it's due to the sign bit: data.table:::binary(0)
# [1] "0 00000000000 000000000000000000000000000000000000 00000000 00000000"
data.table:::binary(-0)
# [1] "1 00000000000 000000000000000000000000000000000000 00000000 00000000" If this is indeed the problem, then it should be localised to only -0's and should be fixed. Thanks again. |
Yes, it's due to the sign bit. Here's an example that should double-verify it: dt = data.table(x=c(0,0,0,-0,-0,-0), y=1:6)
dt[, .N, by=x]
# x N
# 1: 0 3
# 2: 0 3 Will fix. Thanks. |
Now fixed: library(data.table)
dt = data.table(x=c(0,0,0,-0,-0,-0), y=1:6)
dt[, .N, by=x]
# x N
# 1: 0 6 dt1 <- data.table(x2 = 0L)
dt2 <- data.table(x2 = -(11-11)/10)
merge(dt2, dt1, by="x2")
# x2
# 1: 0
merge(dt1, dt2, by="x2")
# x2
# 1: 0 |
I encounted a case where merge is not symmetric, using version 1.9.2 as well as 1.9.3. I have two data.tables Filter (read from a file with fread) und KOut (contructed by serveral data.table operations including merge und setcolorder)
tables()
givesmerging the tables results in a diffrent number of rows, depending on the order of the arguments
For comparison the merge as data.frames
I supected a bug in
merge(Filter,KOut,by=names(Filter))
, so I followed the code of merge till the essential statement:Here
tables()
givesSome joins are:
y[xkey, nomatch=0]
x1 x2 x3 x4 x5 value y1 y2
1: 1 1 1 1 0 1.20693421 57 1
2: 1 1 1 1 0 -0.36395694 57 2
3: 1 1 1 1 0 -1.91636684 57 3
4: 1 1 1 1 0 -0.38118758 57 4
5: 1 1 1 1 0 0.84860626 57 5
---
2555940: 3 1 1 21 2 0.49530287 11697 4400
2555941: 3 1 1 21 2 -2.03795092 11697 4401
2555942: 3 1 1 21 2 1.28866177 11697 4402
2555943: 3 1 1 21 2 -2.02472550 11697 4403
2555944: 3 1 1 21 2 0.01210244 11697 4404
xkey[y, nomatch=0]
x1 x2 x3 x4 x5 value y1 y2
1: 1 0 1 1 0 -0.693537811 57 70578
2: 1 0 1 1 0 0.585084541 57 70579
3: 1 0 1 1 0 0.384647254 57 70580
4: 1 0 1 1 0 -1.011123900 57 70581
5: 1 0 1 1 0 -0.008338746 57 70582
---
2936582: 3 1 1 21 2 0.495302870 11697 4400
2936583: 3 1 1 21 2 -2.037950918 11697 4401
2936584: 3 1 1 21 2 1.288661770 11697 4402
2936585: 3 1 1 21 2 -2.024725499 11697 4403
2936586: 3 1 1 21 2 0.012102439 11697 4404
y[xkey]
x1 x2 x3 x4 x5 value y1 y2
1: 1 0 1 1 0 NA NA NA
2: 1 0 1 1 10 NA NA NA
3: 1 0 1 1 20 NA NA NA
4: 1 0 1 1 30 NA NA NA
5: 1 0 1 1 40 NA NA NA
---
2573000: 3 1 3 21 56 NA NA NA
2573001: 3 1 3 21 57 NA NA NA
2573002: 3 1 3 21 58 NA NA NA
2573003: 3 1 3 21 59 NA NA NA
2573004: 3 1 3 21 60 NA NA NA
Remarkable is the first line of y[xkey], which says the key combination (1,0,1,1,0) in xkey has no match in y. But the first line of xkey[y, nomatch=0] shows that there is in fact such a line in y!
Any ideas?
The text was updated successfully, but these errors were encountered: