-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAM-1308] Fix stack overflow in join with custom iterator impl. #1315
[ADAM-1308] Fix stack overflow in join with custom iterator impl. #1315
Conversation
Test PASSed. |
@@ -263,6 +263,30 @@ private[rdd] case class ManualRegionPartitioner(partitions: Int) extends Partiti | |||
} | |||
} | |||
|
|||
private class AppendableIterator[T] extends Iterator[T] { | |||
var iterators: ListBuffer[Iterator[T]] = ListBuffer.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to worry about thread safety? Would immutable copy-on-write help here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, this is only accessed inside of a single thread.
|
||
def append(iter: Iterator[T]) { | ||
if (iter.hasNext) { | ||
iterators += iter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safe copy? iter
could be modified by the caller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, Scala has a pretty strict "advisory" (not quite a contract?) on using Iterators:
It is of particular importance to note that, unless stated otherwise, one should never use an iterator after calling a method on it.
This is a private class used only in this file, so I think it is OK to make the assumption that iter
is not modified after being passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Np! I should make a pass and document this assumption better.
} | ||
|
||
def hasNext: Boolean = { | ||
iterators.nonEmpty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this query all of the iterators in iterators
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid having to loop over all the iterators, one of the optimizations I made is that we remove empty iterators from the list, and we don't add empty iterators to the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
I was going to try running this on the problem before it's merged unless anyone deems that unnecessary. |
I would really like that so we can confirm that this is a fix before we merge. |
I computed on the same code as before, and did not run into stack overflow errors. I think we can go forward with the merge. |
I reran my pipeline and it looks good. Thanks @fnothaft ! |
Thank you for confirming, @devin-petersohn @akmorrow13 |
Thank you, @fnothaft |
Resolves #1308. CC @akmorrow13 @devin-petersohn