-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Limited Joins (Externalized / Spill) #1599
Comments
I would love to implement this algorithm in DataFusion: https://arxiv.org/abs/2010.00152 |
Maybe you can share it at the next meeting 😄 |
+1 well look forward 😊 |
Haha, that's great! I have talked with @houqp about this paper before. |
Related #141 |
Moved to issue descriptions above. 👆 |
Looks like now that we are able to fail query in case of breaching memory limit, it's the right time to start working on spills. Taking into account what has been written above, I guess, next step could be to implement spilling for MergeJoin -- if our final intention to have runtime HJ -> MJ conversion it would be nice to have some guarantees that MJ won't fail for the same reason. I believe MJ spilling logic could be pretty straightforward without any pitfalls -- the naive approach would be to spill buffered-side data in .ipc batch by batch, more complex, and, probably, more effective way to think about would be spilling concatenation of all batches that fit in memory. After that we could follow-up with what is mentioned in issue description -- HJ -> MJ conversion (I believe #2628 worth to be mentioned here, to unlock ability for more hash joins to be converted), and spilling mechanisms for other join implementations. If this plan is fine, I'd like to take a stab at MJ spilling. |
I agree
Unless there is a very compelling reason to have a separate implementation, I think we should leverage (reuse) the existing
|
After some reading, it looks like that for And, further, |
Thank you -- reusing the ExternalSorter will allow whatever spilling logic we develop to benefit from additional improvements in sort. For example, @jaylmiller has #5292 which makes substantial changes to ExternalSorter and hopefully it will be faster, but has some performance regressions that we haven't worked out yet, It will be great if that work can directly benefit the spilling operators as well
|
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Support Joining "arbitrarily" large inputs (e.g. when one or both of the inputs don't fit in the available RAM)
This ticket concerns the memory used the
JoinExec
operator -- it doesn't cover other potential targets (e.g. externalized sort or grouping). That will be covered by other tasks tracked by #587Describe the solution you'd like
There are many potential ways to Limit the memory used while joining. The classic way is "sort-merge-join" where the input data on both sides is sorted according to the equality predicates (using externalized sort, such as described in #1568 ) and then the two join inputs are streamed through and the output computed, depending on the type of Join required (INNER, LEFT, RIGHT, SEMI, etc)
I personally think the following would be the ideal behavior for DataFusion Joins:
The rationale for a runtime switch is that then the optimizer (which always has limited information) can't make the "wrong" choice related to join order
In case anyone wants some "light reading" this stuff is nicely described by Goetz Graffe in "Query evaluation techniques for large databases": https://scholar.google.com/citations?view_op=view_citation&hl=en&user=pdDeRScAAAAJ&citation_for_view=pdDeRScAAAAJ:u5HHmVD_uO8C
Online link: http://infolab.stanford.edu/~hyunjung/cs346/graefe.pdf
Describe alternatives you've considered
Have the Optimizer (aka the
HashBuildProbeOrder
) pick both the order and the algorithm to use based on statistics or heuristicsContext
This is follow on work from the great PR from @yjshen in #1526 and part of the story of limiting memory used by DataFusion #587
Task Tracking
The text was updated successfully, but these errors were encountered: