-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-2321] Several progress API improvements / refactorings #3197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This makes binary compatibility easier to reason about and might avoid some pitfalls that I’ve run into while attempting to refactor other parts of SparkContext to use mixin traits (see apache#3071, for example). Requiring users to access status API methods through `sc.statusAPI.*` also avoids SparkContext bloat and buys us extra freedom for adding parallel higher / lower-level APIs.
|
Test build #23179 has started for PR 3197 at commit
|
|
Test build #23179 has finished for PR 3197 at commit
|
|
Test FAILed. |
|
Test build #23183 has started for PR 3197 at commit
|
|
Test build #23183 has finished for PR 3197 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just call it status w/o API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 just status would be better i think
|
@JoshRosen this re-org looks better than before, how about move all of these API into a namespace called |
|
I think we want to keep this as a public API - that was the whole reason for adding this in the first place, since it is technically redundant with things users could already do with the SparkListener interface and projects like Hive want us to support a stable API. In terms of the naming, it did seem slightly weird to me to call this |
|
Okay I talked offline with @kayousterhout and the best name we could come up with was the following: IMO this is nicer than the current name since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why bother having this? we can just do new ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal here was to hide this class's constructor from users so that we're free to change it later. I think that making constructors part of public APIs is a bad idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just make the constructor package private? It is really awkward to me that you have to create a factory for this because users are not supposed to create the status listener by themselves.
If you really want to be air tight, a more common way is to expose the interface, and then you have a concrete implementation of the interface. Then you don't have the constructor problem. But that seems overkill to me too for this.
If you really want a factory, I'd use something other than apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to make a Scala constructor Java-package-private as opposed to Scala-package-private, since that can become public from Java's point of view? I think I originally used this factory pattern for JavaSparkStatusAPI and just kept the same approach here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think that CompanionObject.apply() might be a fairly common idiom; I think it's used in several of the Scala standard libraries. I don't really care what we call it, one way or the other, so I can change it if you think that apply is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry the main problem I have is that I don't get why we need to protect the constructor at all. It is not something we expect the users to call. Why don't you just remove all of these stuff, and add a line in javadoc for the constructor saying we don't expect users to call this constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's fine; I'll just make both constructors private[spark] and add a note; as long as we've warned users not to call the constructor and hidden it from the Scaladoc, then I don't think anyone should complain if we need to change it later.
|
+1 on statusTracker |
Remove factory methods and replace with private[spark] constructors.
|
Test build #23413 has started for PR 3197 at commit
|
|
Hmmm in case I haven't expressed this earlier, I really like this new API. |
|
LGTM. |
|
Test build #23413 has finished for PR 3197 at commit
|
|
Test PASSed. |
|
Merging in master & branch-1.2. Thanks! |
This PR refactors / extends the status API introduced in #2696. - Change StatusAPI from a mixin trait to a class. Before, the new status API methods were directly accessible through SparkContext, whereas now they're accessed through a `sc.statusAPI` field. As long as we were going to add these methods directly to SparkContext, the mixin trait seemed like a good idea, but this might be simpler to reason about and may avoid pitfalls that I've run into while attempting to refactor other parts of SparkContext to use mixins (see #3071, for example). - Change the name from SparkStatusAPI to SparkStatusTracker. - Make `getJobIdsForGroup(null)` return ids for jobs that aren't associated with any job group. - Add `getActiveStageIds()` and `getActiveJobIds()` methods that return the ids of whatever's currently active in this SparkContext. This should simplify davies's progress bar code. Author: Josh Rosen <joshrosen@databricks.com> Closes #3197 from JoshRosen/progress-api-improvements and squashes the following commits: 30b0afa [Josh Rosen] Rename SparkStatusAPI to SparkStatusTracker. d1b08d8 [Josh Rosen] Add missing newlines 2cc7353 [Josh Rosen] Add missing file. d5eab1f [Josh Rosen] Add getActive[Stage|Job]Ids() methods. a227984 [Josh Rosen] getJobIdsForGroup(null) should return jobs for default group c47e294 [Josh Rosen] Remove StatusAPI mixin trait. (cherry picked from commit 40eb8b6) Signed-off-by: Reynold Xin <rxin@databricks.com>
This PR refactors / extends the status API introduced in #2696.
sc.statusAPIfield. As long as we were going to add these methods directly to SparkContext, the mixin trait seemed like a good idea, but this might be simpler to reason about and may avoid pitfalls that I've run into while attempting to refactor other parts of SparkContext to use mixins (see Move SparkContext accumulator methods to Accumulators.scala #3071, for example).getJobIdsForGroup(null)return ids for jobs that aren't associated with any job group.getActiveStageIds()andgetActiveJobIds()methods that return the ids of whatever's currently active in this SparkContext. This should simplify @davies's progress bar code.