-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11293] Fix shuffle memory leaks in Spillable collections and UnsafeShuffleWriter #9260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Fix leak in ExternalSorter.stop(). - Use CompletionIterator in BlockStoreShuffleReader. - Fix leak in UnsafeShuffleWriter (this one wouldn't affect users, since the leak could only occur precisely before the task finished, but it broke the new tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated whether to push this burden to callers rather than using a CompletionIterator here until I noticed that ExternalIterator() ends up calling currentMap.destructiveSortedIterator(), so (implicitly) this method already had the limitation that it could only be called once.
|
Test build #44281 has finished for PR 9260 at commit
|
|
Test build #44288 has finished for PR 9260 at commit
|
|
Jenkins, retest this please. |
|
Test build #44291 has finished for PR 9260 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the fix, we will see this exception during tests, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
|
Looks good to me. Probably ask someone who is more familiar with this part of the code for the final sign-off. |
|
I've gone ahead and merged #9127, which contains something similar to these changes, but updated to reflect the memory manager unification. I still think that we should consider merging this patch as it stands now into 1.5.x and 1.4.x. |
|
Ping @yhuai, any objection to merging this to 1.5? A similar fix has already been incorporated into master as part of my memory manager consolidation patch. We could also choose not to merge / backport this patch, since AFAIK nobody has complained about this issue. |
|
Is it risky? Looks not? If it is safe patch, wow about we add the check in |
This patch fixes multiple memory leaks in
Spillablecollections, as well as a leak inUnsafeShuffleWriter. There were a small handful of places where tasks would acquire memory from theShuffleMemoryManagerbut would not release it by the time the task had ended. TheUnsafeShuffleWritercase was harmless, since the leak could only occur at the very end of a task, but the other two cases are somewhat serious:ExternalSorter.stop()did not release the sorter's memory. In addition,BlockStoreShuffleReadernever calledstop()once the sorter's iterator was fully-consumed. Put together, these bugs meant that a shuffle which performed a reduce-side could starve downstream piplelined transformations of shuffle memory.ExternalAppendOnlyMapexposes no equivalent ofstop()and its iterators do not automatically free its in-memory data upon completion. This could cause aggregation operations to starve other operations of shuffle memory.This patch adds a regression test and fixes all three leaks. I'd like to backport this patch to Spark 1.5.x and possibly to other maintenance releases.