-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-4423] Improve foreach() documentation to avoid confusion between local- and cluster-mode behavior #4696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tand confusing behavior of foreach and map functions when attempting to modify variables outside of the scope of an RDD action or transformation
|
Test build #27728 has started for PR 4696 at commit
|
|
Test build #27728 has finished for PR 4696 at commit
|
|
Test PASSed. |
docs/programming-guide.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably better to word it as "It is important to understand the scope and life cycle of variables and methods ...", instead of saying it is unintuitive.
Also overall I think you'd want to point out closures are always executed on executors and should not be used to mutate state, and state that the only exception is when running in local testing mode. If some global aggregation is needed, use an aggregator.
|
Hi @rxin, thanks for the feedback. I've updated the doc, please let me know what you think. |
|
Test build #27779 has started for PR 4696 at commit
|
|
Test build #27779 has finished for PR 4696 at commit
|
|
Test PASSed. |
docs/programming-guide.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: naiive (should only have one i)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comma here should come after "below"
|
Hi Josh - I pulled in your suggestions. Please let me know if anything else needs work. Thanks! |
|
Test build #27794 has started for PR 4696 at commit
|
|
Test build #27794 has finished for PR 4696 at commit
|
|
Test PASSed. |
|
Hi all - does this need anything else prior to being merged? |
docs/programming-guide.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth sharpening what "local" means -- master = local[n] right? It's not specific to the shell; you can run the shell against YARN.
Really the difference is between happening to execute entirely within one JVM, and not. It is possible to run a standalone cluster "locally" that would not exhibit this behavior.
docs/programming-guide.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, I'd describe this as undefined instead. It will not necessarily update local variables in the driver and should be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should still be updated, and should probably make a few edits like spark -> Spark, or accumulator -> Accumulator
|
Test build #28400 has started for PR 4696 at commit
|
|
Test build #28400 has finished for PR 4696 at commit
|
|
Test PASSed. |
docs/programming-guide.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one won't compile and I think the Python equivalent won't either, but don't fix it up yet; if that's all there is I can easily do it on merge.
|
Yep, I think it does need one more pass of edits but is 99% there. |
… semicolons for java
|
Test build #28437 has started for PR 4696 at commit
|
|
Test build #28437 has finished for PR 4696 at commit
|
|
Test PASSed. |
|
Looking good but there is still a set of small changes I think need to be made in the last diff in this PR to be consistent with the rest of the text. See my notes there. Otherwise LGTM and will leave it open for a day for final comments. |
|
Hi Sean - I've added those fixes. I missed that last set of comments you mentioned. Thanks. |
|
Test build #28443 has started for PR 4696 at commit
|
|
Test FAILed. |
|
Test build #28443 has finished for PR 4696 at commit
|
|
Test PASSed. |
Hi all - I've added a writeup on how closures work within Spark to help clarify the general case for this problem and similar problems. I hope this addresses the issue and would love any feedback.