Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Right and Left Outer Shuffle Region Join don't match #1813

Closed
fnothaft opened this issue Dec 1, 2017 · 0 comments · Fixed by #1814
Closed

Right and Left Outer Shuffle Region Join don't match #1813

fnothaft opened this issue Dec 1, 2017 · 0 comments · Fixed by #1814
Assignees
Labels
Milestone

Comments

@fnothaft
Copy link
Member

fnothaft commented Dec 1, 2017

E.g.,

scala> snps.leftOuterShuffleRegionJoin(filteredSnps).transform(_.filter(r => !r._2.isEmpty)).rdd.count
res8: Long = 774212                                                             

scala> filteredSnps.rightOuterShuffleRegionJoin(snps).transform(_.filter(r => !r._1.isEmpty)).rdd.count
res9: Long = 197826                                                             

scala> filteredSnps.shuffleRegionJoin(snps).rdd.count
res10: Long = 774212                                                            

scala> snps.shuffleRegionJoin(filteredSnps).rdd.count
res11: Long = 774212

Historically, rightOuterShuffleRegionJoin just called leftOuterShuffleRegionJoin (at the sort/merge join level, see here) followed by swap on the tuples. ad5ae6d introduced a new implementation of the right outer shuffle region join that has correctness issues.

@fnothaft fnothaft added the bug label Dec 1, 2017
@fnothaft fnothaft added this to the 0.23.0 milestone Dec 1, 2017
@fnothaft fnothaft self-assigned this Dec 1, 2017
fnothaft added a commit to fnothaft/adam that referenced this issue Dec 1, 2017
…lementation.

Left and right outer joins are symmetric: that is to say, a right outer join is
can be rewritten as a left outer join by swapping the two input tables, and by
modifying the layout of the output. To resolve the mismatch between the left and
right outer joins, this PR deletes the right outer join implementation and
delegates back to the left outer join + tuple order swap. Resolves bigdatagenomics#1813.
heuermh pushed a commit that referenced this issue Dec 1, 2017
…lementation.

Left and right outer joins are symmetric: that is to say, a right outer join is
can be rewritten as a left outer join by swapping the two input tables, and by
modifying the layout of the output. To resolve the mismatch between the left and
right outer joins, this PR deletes the right outer join implementation and
delegates back to the left outer join + tuple order swap. Resolves #1813.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant