Skip to content

Add test for sketchJoin bug #1218#1450

Closed
wolfika wants to merge 1 commit intotwitter:developfrom
wolfika:develop
Closed

Add test for sketchJoin bug #1218#1450
wolfika wants to merge 1 commit intotwitter:developfrom
wolfika:develop

Conversation

@wolfika
Copy link

@wolfika wolfika commented Oct 31, 2015

fixes #1449

@johnynek
Copy link
Collaborator

johnynek commented Nov 1, 2015

great!

Now bonus points: I think the code here:

https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/typed/Sketched.scala#L119

Should be changed to:

val lhs = flatMapWithReplicas(left.pipe){
  case 0 => Nil
  case n => List(rand.nextInt(n) + 1)
}

So, if we know for sure the key does not appear in the other side, we don't need to replicate it. We can double check the logic here: https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/typed/Sketched.scala#L105

@johnynek
Copy link
Collaborator

johnynek commented Nov 1, 2015

actually, I'm confused about this bug now. The random only happens on the left side. The count for all the items on the left side should be > 0 by definition (the CMS can overestimate, but not underestimate the count). This requires a bit more thought. Without an explanation of the bug, we shouldn't just merge something, even if it seems to fix things.

@johnynek
Copy link
Collaborator

johnynek commented Nov 1, 2015

Dammit. == in Array is biting us here.

twitter/algebird#497

@johnynek
Copy link
Collaborator

johnynek commented Nov 4, 2015

I'm going to close this, but I used your code in #1451. Thanks for the contribution!

@johnynek johnynek closed this Nov 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add failing test for sketchJoin bug #1218

2 participants