-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-11293] Fix shuffle memory leaks in Spillable collections and UnsafeShuffleWriter (branch-1.5) #9427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11293] Fix shuffle memory leaks in Spillable collections and UnsafeShuffleWriter (branch-1.5) #9427
Conversation
- Fix leak in ExternalSorter.stop(). - Use CompletionIterator in BlockStoreShuffleReader. - Fix leak in UnsafeShuffleWriter (this one wouldn't affect users, since the leak could only occur precisely before the task finished, but it broke the new tests).
|
Test build #44858 timed out for PR 9427 at commit |
|
Test build #44859 timed out for PR 9427 at commit |
|
Hmm, looks like legitimate test failures. I'll investigate tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of failed task, it's fine to have some memory leak (will be freed finally), will this exception override the real one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping
|
LGTM, except one minor comment. |
|
@davies, I've updated this to address your comment; PTAL. |
|
Test build #46559 timed out for PR 9427 at commit |
|
Jenkins, retest this please. |
|
Test build #46648 timed out for PR 9427 at commit |
|
Test build #2115 timed out for PR 9427 at commit |
|
Test build #2150 timed out for PR 9427 at commit |
|
@JoshRosen Is there something wrong with this PR? |
|
Looks like it timed out while compiling? Let me try again. Jenkins, retest this please. |
|
tes this please |
|
test this please |
|
Test build #47344 timed out for PR 9427 at commit |
|
retest this please |
|
Test build #47686 timed out for PR 9427 at commit |
|
@JoshRosen It should be something wrong in this PR, if we can't fix it easily, would you mind close this one? |
|
Yeah, I'm going to close this for now; I don't think that this is a high-priority issue to fix for 1.5.x since it's been around forever and nobody reported problems due to it. |
This patch fixes multiple memory leaks in
Spillablecollections, as well as a leak inUnsafeShuffleWriter. There were a small handful of places where tasks would acquire memory from theShuffleMemoryManagerbut would not release it by the time the task had ended. TheUnsafeShuffleWritercase was harmless, since the leak could only occur at the very end of a task, but the other two cases are somewhat serious:ExternalSorter.stop()did not release the sorter's memory. In addition,BlockStoreShuffleReadernever calledstop()once the sorter's iterator was fully-consumed. Put together, these bugs meant that a shuffle which performed a reduce-side could starve downstream piplelined transformations of shuffle memory.ExternalAppendOnlyMapexposes no equivalent ofstop()and its iterators do not automatically free its in-memory data upon completion. This could cause aggregation operations to starve other operations of shuffle memory.This patch adds a regression test and fixes all three leaks.
This patch was originally opened as #9260; this version is the 1.5.x backport.