Skip to content

Conversation

@ilganeli
Copy link

Hi all - I've added a writeup on how closures work within Spark to help clarify the general case for this problem and similar problems. I hope this addresses the issue and would love any feedback.

Ilya Ganelin added 3 commits February 12, 2015 08:20
…tand confusing behavior of foreach and map functions when attempting to modify variables outside of the scope of an RDD action or transformation
@SparkQA
Copy link

SparkQA commented Feb 19, 2015

Test build #27728 has started for PR 4696 at commit d374d3a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 19, 2015

Test build #27728 has finished for PR 4696 at commit d374d3a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27728/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better to word it as "It is important to understand the scope and life cycle of variables and methods ...", instead of saying it is unintuitive.

Also overall I think you'd want to point out closures are always executed on executors and should not be used to mutate state, and state that the only exception is when running in local testing mode. If some global aggregation is needed, use an aggregator.

@ilganeli
Copy link
Author

Hi @rxin, thanks for the feedback. I've updated the doc, please let me know what you think.

@SparkQA
Copy link

SparkQA commented Feb 20, 2015

Test build #27779 has started for PR 4696 at commit 5dbbda5.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 20, 2015

Test build #27779 has finished for PR 4696 at commit 5dbbda5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27779/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: naiive (should only have one i)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comma here should come after "below"

@ilganeli
Copy link
Author

Hi Josh - I pulled in your suggestions. Please let me know if anything else needs work. Thanks!

@SparkQA
Copy link

SparkQA commented Feb 20, 2015

Test build #27794 has started for PR 4696 at commit 448bd79.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 20, 2015

Test build #27794 has finished for PR 4696 at commit 448bd79.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27794/
Test PASSed.

@ilganeli
Copy link
Author

Hi all - does this need anything else prior to being merged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth sharpening what "local" means -- master = local[n] right? It's not specific to the shell; you can run the shell against YARN.

Really the difference is between happening to execute entirely within one JVM, and not. It is possible to run a standalone cluster "locally" that would not exhibit this behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I'd describe this as undefined instead. It will not necessarily update local variables in the driver and should be avoided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still be updated, and should probably make a few edits like spark -> Spark, or accumulator -> Accumulator

@SparkQA
Copy link

SparkQA commented Mar 9, 2015

Test build #28400 has started for PR 4696 at commit 4772f99.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 9, 2015

Test build #28400 has finished for PR 4696 at commit 4772f99.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28400/
Test PASSed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one won't compile and I think the Python equivalent won't either, but don't fix it up yet; if that's all there is I can easily do it on merge.

@srowen
Copy link
Member

srowen commented Mar 9, 2015

Yep, I think it does need one more pass of edits but is 99% there.

@SparkQA
Copy link

SparkQA commented Mar 10, 2015

Test build #28437 has started for PR 4696 at commit 2fd2a07.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 10, 2015

Test build #28437 has finished for PR 4696 at commit 2fd2a07.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28437/
Test PASSed.

@srowen
Copy link
Member

srowen commented Mar 10, 2015

Looking good but there is still a set of small changes I think need to be made in the last diff in this PR to be consistent with the rest of the text. See my notes there. Otherwise LGTM and will leave it open for a day for final comments.

@ilganeli
Copy link
Author

Hi Sean - I've added those fixes. I missed that last set of comments you mentioned. Thanks.

@SparkQA
Copy link

SparkQA commented Mar 10, 2015

Test build #28443 has started for PR 4696 at commit c5dc498.

  • This patch merges cleanly.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28442/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Mar 10, 2015

Test build #28443 has finished for PR 4696 at commit c5dc498.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28443/
Test PASSed.

@asfgit asfgit closed this in 548643a Mar 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants