Skip to content

Conversation

@JoshRosen
Copy link
Contributor

This PR refactors / extends the status API introduced in #2696.

  • Change StatusAPI from a mixin trait to a class. Before, the new status API methods were directly accessible through SparkContext, whereas now they're accessed through a sc.statusAPI field. As long as we were going to add these methods directly to SparkContext, the mixin trait seemed like a good idea, but this might be simpler to reason about and may avoid pitfalls that I've run into while attempting to refactor other parts of SparkContext to use mixins (see Move SparkContext accumulator methods to Accumulators.scala #3071, for example).
  • Change the name from SparkStatusAPI to SparkStatusTracker.
  • Make getJobIdsForGroup(null) return ids for jobs that aren't associated with any job group.
  • Add getActiveStageIds() and getActiveJobIds() methods that return the ids of whatever's currently active in this SparkContext. This should simplify @davies's progress bar code.

This makes binary compatibility easier to reason about and might avoid
some pitfalls that I’ve run into while attempting to refactor other
parts of SparkContext to use mixin traits (see apache#3071, for example).

Requiring users to access status API methods through `sc.statusAPI.*`
also avoids SparkContext bloat and buys us extra freedom for adding
parallel higher / lower-level APIs.
@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23179 has started for PR 3197 at commit 2cc7353.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23179 has finished for PR 3197 at commit 2cc7353.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SparkContext(config: SparkConf) extends Logging

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23179/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23183 has started for PR 3197 at commit d1b08d8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23183 has finished for PR 3197 at commit d1b08d8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23183/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just call it status w/o API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 just status would be better i think

@davies
Copy link
Contributor

davies commented Nov 12, 2014

@JoshRosen this re-org looks better than before, how about move all of these API into a namespace called sc.internal and mark all of them as DeveloperAPI?

@JoshRosen
Copy link
Contributor Author

@davies This is a low-level API that's not really designed for typical users to use, but it's also an API that we want to stabilize. @pwendell do you think this should be @developerAPI even if we plan to maintain fairly strong compatibility guarantees for it?

@pwendell
Copy link
Contributor

I think we want to keep this as a public API - that was the whole reason for adding this in the first place, since it is technically redundant with things users could already do with the SparkListener interface and projects like Hive want us to support a stable API.

In terms of the naming, it did seem slightly weird to me to call this statusAPI when for other things we don't use the term "API" anywhere (e.g. SparkListener). I'll defer to @rxin @kayousterhout or anyone else who has feelings about this for more feedback.

@pwendell
Copy link
Contributor

Okay I talked offline with @kayousterhout and the best name we could come up with was the following:

class SparkStatusTracker
...
val statusTracker = new SparkStatusTracker(this)

IMO this is nicer than the current name since API is sort of implicit in the fact that this is an exposed class (i.e. in some sense everything is an API). The name "Tracker" implies that this is an object that actively is tracking changes. So this is my favorite option. I also thing SparkStatus and val status is alright. Both of these I prefer to the current naming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother having this? we can just do new ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here was to hide this class's constructor from users so that we're free to change it later. I think that making constructors part of public APIs is a bad idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just make the constructor package private? It is really awkward to me that you have to create a factory for this because users are not supposed to create the status listener by themselves.

If you really want to be air tight, a more common way is to expose the interface, and then you have a concrete implementation of the interface. Then you don't have the constructor problem. But that seems overkill to me too for this.

If you really want a factory, I'd use something other than apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make a Scala constructor Java-package-private as opposed to Scala-package-private, since that can become public from Java's point of view? I think I originally used this factory pattern for JavaSparkStatusAPI and just kept the same approach here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think that CompanionObject.apply() might be a fairly common idiom; I think it's used in several of the Scala standard libraries. I don't really care what we call it, one way or the other, so I can change it if you think that apply is confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry the main problem I have is that I don't get why we need to protect the constructor at all. It is not something we expect the users to call. Why don't you just remove all of these stuff, and add a line in javadoc for the constructor saying we don't expect users to call this constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's fine; I'll just make both constructors private[spark] and add a note; as long as we've warned users not to call the constructor and hidden it from the Scaladoc, then I don't think anyone should complain if we need to change it later.

@rxin
Copy link
Contributor

rxin commented Nov 14, 2014

+1 on statusTracker

@JoshRosen JoshRosen changed the title Several progress API improvements / refactorings [SPARK-2321] Several progress API improvements / refactorings Nov 14, 2014
Remove factory methods and replace with private[spark] constructors.
@SparkQA
Copy link

SparkQA commented Nov 15, 2014

Test build #23413 has started for PR 3197 at commit 30b0afa.

  • This patch merges cleanly.

@rxin
Copy link
Contributor

rxin commented Nov 15, 2014

Hmmm in case I haven't expressed this earlier, I really like this new API.

@rxin
Copy link
Contributor

rxin commented Nov 15, 2014

LGTM.

@SparkQA
Copy link

SparkQA commented Nov 15, 2014

Test build #23413 has finished for PR 3197 at commit 30b0afa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23413/
Test PASSed.

@rxin
Copy link
Contributor

rxin commented Nov 15, 2014

Merging in master & branch-1.2. Thanks!

asfgit pushed a commit that referenced this pull request Nov 15, 2014
This PR refactors / extends the status API introduced in #2696.

- Change StatusAPI from a mixin trait to a class.  Before, the new status API methods were directly accessible through SparkContext, whereas now they're accessed through a `sc.statusAPI` field.  As long as we were going to add these methods directly to SparkContext, the mixin trait seemed like a good idea, but this might be simpler to reason about and may avoid pitfalls that I've run into while attempting to refactor other parts of SparkContext to use mixins (see #3071, for example).
- Change the name from SparkStatusAPI to SparkStatusTracker.
- Make `getJobIdsForGroup(null)` return ids for jobs that aren't associated with any job group.
- Add `getActiveStageIds()` and `getActiveJobIds()` methods that return the ids of whatever's currently active in this SparkContext.  This should simplify davies's progress bar code.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #3197 from JoshRosen/progress-api-improvements and squashes the following commits:

30b0afa [Josh Rosen] Rename SparkStatusAPI to SparkStatusTracker.
d1b08d8 [Josh Rosen] Add missing newlines
2cc7353 [Josh Rosen] Add missing file.
d5eab1f [Josh Rosen] Add getActive[Stage|Job]Ids() methods.
a227984 [Josh Rosen] getJobIdsForGroup(null) should return jobs for default group
c47e294 [Josh Rosen] Remove StatusAPI mixin trait.

(cherry picked from commit 40eb8b6)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in 40eb8b6 Nov 15, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants