-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26713][CORE] Interrupt pipe IO threads in PipedRDD when task is finished #23638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -141,7 +141,14 @@ final class ShuffleBlockFetcherIterator( | |
|
|
||
| /** | ||
| * Whether the iterator is still active. If isZombie is true, the callback interface will no | ||
| * longer place fetched blocks into [[results]]. | ||
| * longer place fetched blocks into [[results]] and the iterator is marked as fully consumed. | ||
| * | ||
| * When the iterator is inactive, [[hasNext]] and [[next]] calls will honor that as there are | ||
| * cases the iterator is still being consumed. For example, ShuffledRDD + PipedRDD if the | ||
| * subprocess command is failed. The task will be marked as failed, then the iterator will be | ||
| * cleaned up at task completion, the [[next]] call (called in the stdin writer thread of | ||
| * PipedRDD if not exited yet) may hang at [[results.take]]. The defensive check in [[hasNext]] | ||
| * and [[next]] reduces the possibility of such race conditions. | ||
| */ | ||
| @GuardedBy("this") | ||
| private[this] var isZombie = false | ||
|
|
@@ -372,7 +379,7 @@ final class ShuffleBlockFetcherIterator( | |
| logDebug("Got local blocks in " + Utils.getUsedTimeMs(startTime)) | ||
| } | ||
|
|
||
| override def hasNext: Boolean = numBlocksProcessed < numBlocksToFetch | ||
| override def hasNext: Boolean = !isZombie && (numBlocksProcessed < numBlocksToFetch) | ||
|
||
|
|
||
| /** | ||
| * Fetches the next (BlockId, InputStream). If a task fails, the ManagedBuffers | ||
|
|
@@ -384,7 +391,7 @@ final class ShuffleBlockFetcherIterator( | |
| */ | ||
| override def next(): (BlockId, InputStream) = { | ||
| if (!hasNext) { | ||
| throw new NoSuchElementException | ||
| throw new NoSuchElementException() | ||
| } | ||
|
|
||
| numBlocksProcessed += 1 | ||
|
|
@@ -395,7 +402,7 @@ final class ShuffleBlockFetcherIterator( | |
| // then fetch it one more time if it's corrupt, throw FailureFetchResult if the second fetch | ||
| // is also corrupt, so the previous stage could be retried. | ||
| // For local shuffle block, throw FailureFetchResult for the first IOException. | ||
| while (result == null) { | ||
| while (!isZombie && result == null) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it possible that
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that can happen. Right now I think it's 'worse' in that the iterator might be cleaned up and yet next() will keep querying the iterator that's being drained by cleanup(). To really tighten it up I think more or all of We could follow this up with small things like making @advancedxy what do you think? I think the argument is merely that this fixes the potential issue in 99% of cases, not 100%.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Yeah, it can happen. But I agree with @srowen. The
Maybe. But I would leave it as it's if It's up to me. Like you said, this doesn't prevent the semantics changing but a little tighter. |
||
| val startFetchWait = System.currentTimeMillis() | ||
| result = results.take() | ||
| val stopFetchWait = System.currentTimeMillis() | ||
|
|
@@ -489,6 +496,9 @@ final class ShuffleBlockFetcherIterator( | |
| fetchUpToMaxBytes() | ||
| } | ||
|
|
||
| if (result == null) { // the iterator is already closed/cleaned up. | ||
| throw new NoSuchElementException() | ||
| } | ||
| currentResult = result.asInstanceOf[SuccessFetchResult] | ||
| (currentResult.blockId, new BufferReleasingInputStream(input, this)) | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a task finishes, do we really need to guarantee all the iterators stop producing data? I agree it's better, but I'm afraid it's too much effort to guarantee it. Not only shuffle reader, we also need to fix the sort iterator, aggregate iterator and so on.
And why can't
PipelinedRDDstop consuming input? I think it's better to fix the solo consumer side, instead of fixing different kinds of producer sides.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
PipedRDDstops consuming input in this PR. As for theShuffedRDD+PipedRDDsolely, the fixes inPipedRDDis sufficient. But I noticed the iterator still producing data is also the cause, therefore I made the corresponding changes.I think we can try our best to guarantee that. If it's too much effort, we could stop trying or try different approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"try best" is not a "guarantee".
If we don't need to do this, I suggest we should not do it at all. The new changes in
ShuffleBlockFetcherIteratormake it harder for people to understand the code(at least to me), and also breaks the semantic of Iterator. And I don't see much benefit of doing it, as thePipedRDDhas been fixed. Can we revert the changes inShuffleBlockFetcherIterator?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you insist, I can revert those changes. But let's wait and see if others have other opinions.
cc @srowen, @HyukjinKwon and @viirya.
Rarely and shouldn't matter that much if the task is already finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is not the expected behavior of Iterator. If there are elements in Iterator, it should return true when
hasNextis called. It sounds more reasonable at the consumer side to stop consuming Iterator.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, how about a follow-up that tries a different approach? the current change isn't harmful per se, and a small improvement.
You're suggesting reading and storing the next element that's available, if not already read, in
hasNext? and thennextmust callhasNextto ensure this is filled if not already, which it does already? yeah that seems reasonable. That pattern is used in other iterators sometimes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, if we can do it soon. Hopefully we don't leave this partial fix in the code base for years...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. We are on the same page now. And I think @cloud-fan's proposal seems reasonable. I may create a new JIRA and come up with a different fix. However, I am leaving for holidays (lunar new year) soon. Cannot guarantee it will be finished in a couple days, but I will try my best to resolve it before the ending of Feb. Others are welcome to take it over if too much delay.
P.S: I just looked through potential similar issues to PipedRDD. I believe
RRunnermay have the same issue as it doesn't clean up threads. On the other side,PythonRunnergracefully stops its threads.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@advancedxy thanks for working on it! I'm leaving for lunar new year soon too, end of Feb sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@advancedxy I am facing the the same issue discussed by @cloud-fan that there are 2 threads consuming the results queue at the same time, and causing spark application to hang. Is the fix for this issue being worked on right now . Is there a JIRA to track the fix?