Add test for sketchJoin bug #1218#1450
Add test for sketchJoin bug #1218#1450wolfika wants to merge 1 commit intotwitter:developfrom wolfika:develop
Conversation
|
great! Now bonus points: I think the code here: Should be changed to: val lhs = flatMapWithReplicas(left.pipe){
case 0 => Nil
case n => List(rand.nextInt(n) + 1)
}So, if we know for sure the key does not appear in the other side, we don't need to replicate it. We can double check the logic here: https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/typed/Sketched.scala#L105 |
|
actually, I'm confused about this bug now. The random only happens on the left side. The count for all the items on the left side should be > 0 by definition (the CMS can overestimate, but not underestimate the count). This requires a bit more thought. Without an explanation of the bug, we shouldn't just merge something, even if it seems to fix things. |
|
Dammit. |
|
I'm going to close this, but I used your code in #1451. Thanks for the contribution! |
fixes #1449