Move SparkContext accumulator methods to Accumulators.scala #3071

JoshRosen · 2014-11-03T07:43:25Z

This commit moves the SparkContext accumulator and accumulable methods and the AccumulableParam implicits into mixin traits that are defined inside of the Accumulators.scala file. I think that this improves readability by grouping together all accumulator-related code in the same file; it also helps to reduce the bloat of SparkContext.scala.

From a user-facing perspective, nothing has changed.

SparkQA · 2014-11-03T07:49:53Z

Test build #22808 has started for PR 3071 at commit e08a588.

This patch merges cleanly.

SparkQA · 2014-11-03T08:54:10Z

Test build #22808 has finished for PR 3071 at commit e08a588.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class SparkContext(config: SparkConf) extends Logging

AmplabJenkins · 2014-11-03T08:54:14Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22808/
Test FAILed.

This makes binary compatibility easier to reason about and might avoid some pitfalls that I’ve run into while attempting to refactor other parts of SparkContext to use mixin traits (see apache#3071, for example). Requiring users to access status API methods through `sc.statusAPI.*` also avoids SparkContext bloat and buys us extra freedom for adding parallel higher / lower-level APIs.

JoshRosen · 2014-11-11T21:52:06Z

I'm going to close this for now; it looks like this breaks closure cleaning due to the addition of another level of $outer references. I'll be able to post a better explanation / picture once I clean up my serialization debugging visualizer.

This PR refactors / extends the status API introduced in #2696. - Change StatusAPI from a mixin trait to a class. Before, the new status API methods were directly accessible through SparkContext, whereas now they're accessed through a `sc.statusAPI` field. As long as we were going to add these methods directly to SparkContext, the mixin trait seemed like a good idea, but this might be simpler to reason about and may avoid pitfalls that I've run into while attempting to refactor other parts of SparkContext to use mixins (see #3071, for example). - Change the name from SparkStatusAPI to SparkStatusTracker. - Make `getJobIdsForGroup(null)` return ids for jobs that aren't associated with any job group. - Add `getActiveStageIds()` and `getActiveJobIds()` methods that return the ids of whatever's currently active in this SparkContext. This should simplify davies's progress bar code. Author: Josh Rosen <joshrosen@databricks.com> Closes #3197 from JoshRosen/progress-api-improvements and squashes the following commits: 30b0afa [Josh Rosen] Rename SparkStatusAPI to SparkStatusTracker. d1b08d8 [Josh Rosen] Add missing newlines 2cc7353 [Josh Rosen] Add missing file. d5eab1f [Josh Rosen] Add getActive[Stage|Job]Ids() methods. a227984 [Josh Rosen] getJobIdsForGroup(null) should return jobs for default group c47e294 [Josh Rosen] Remove StatusAPI mixin trait. (cherry picked from commit 40eb8b6) Signed-off-by: Reynold Xin <rxin@databricks.com>

This PR refactors / extends the status API introduced in #2696. - Change StatusAPI from a mixin trait to a class. Before, the new status API methods were directly accessible through SparkContext, whereas now they're accessed through a `sc.statusAPI` field. As long as we were going to add these methods directly to SparkContext, the mixin trait seemed like a good idea, but this might be simpler to reason about and may avoid pitfalls that I've run into while attempting to refactor other parts of SparkContext to use mixins (see #3071, for example). - Change the name from SparkStatusAPI to SparkStatusTracker. - Make `getJobIdsForGroup(null)` return ids for jobs that aren't associated with any job group. - Add `getActiveStageIds()` and `getActiveJobIds()` methods that return the ids of whatever's currently active in this SparkContext. This should simplify davies's progress bar code. Author: Josh Rosen <joshrosen@databricks.com> Closes #3197 from JoshRosen/progress-api-improvements and squashes the following commits: 30b0afa [Josh Rosen] Rename SparkStatusAPI to SparkStatusTracker. d1b08d8 [Josh Rosen] Add missing newlines 2cc7353 [Josh Rosen] Add missing file. d5eab1f [Josh Rosen] Add getActive[Stage|Job]Ids() methods. a227984 [Josh Rosen] getJobIdsForGroup(null) should return jobs for default group c47e294 [Josh Rosen] Remove StatusAPI mixin trait.

Move SparkContext accumulator methods into separate traits.

e08a588

JoshRosen mentioned this pull request Nov 11, 2014

[SPARK-2321] Several progress API improvements / refactorings #3197

Closed

JoshRosen closed this Nov 11, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move SparkContext accumulator methods to Accumulators.scala #3071

Move SparkContext accumulator methods to Accumulators.scala #3071

Uh oh!

JoshRosen commented Nov 3, 2014

Uh oh!

SparkQA commented Nov 3, 2014

Uh oh!

SparkQA commented Nov 3, 2014

Uh oh!

AmplabJenkins commented Nov 3, 2014

Uh oh!

JoshRosen commented Nov 11, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Move SparkContext accumulator methods to Accumulators.scala #3071

Move SparkContext accumulator methods to Accumulators.scala #3071

Uh oh!

Conversation

JoshRosen commented Nov 3, 2014

Uh oh!

SparkQA commented Nov 3, 2014

Uh oh!

SparkQA commented Nov 3, 2014

Uh oh!

AmplabJenkins commented Nov 3, 2014

Uh oh!

JoshRosen commented Nov 11, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants