Skip to content

Conversation

@karuppayya
Copy link
Contributor

@karuppayya karuppayya commented Nov 21, 2016

What is this PR for?

A paragraph execution may result in spark job(s).
Adding ability to access the spark job UI(corresponding to the job generated by the paragraph run), directly from the paragraph.

What type of PR is it?

Improvement

Todos

  • Write tests

What is the Jira issue?

ZEPPELIN-1692

How should this be tested?

Run paragraphs with spark code(scala, pyspark, sql, R).
The paragraph will display a button on the top right corner, which on click will open up the corresponding job UI

Screenshots (if appropriate)

spark_jobs

Questions:

  • Does the licenses files need update? NA
  • Is there breaking changes for older versions? NA
  • Does this needs documentation? NA

@cloverhearts
Copy link
Member

awsome!

Date dateUpdated;
private Map<String, Object> config; // paragraph configs like isOpen, colWidth, etc
public final GUI settings; // form and parameter settings
private Map<String, Set<String>> runtimeInfos;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont want to persist this info in json.
When i add the transient modifier, the serailization to JSON(for broadcasting to client browser) doesnot happen.
I can add an Exclusion strategy for all the storage handlers. But wanted to check if there is an easier way to achieve this . @Leemoonsoo @cloverhearts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,
I think there is no problem using transient.
If you think you have a problem with using transient, can you comment on reason?

Actually, my english is not good.
I am sorry if I have been rude to you.
Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloverhearts Your English is not bad 👍

If i add transient modifier, when Gson.toJson(Paragraph) is done to send message about note/paragraph, the runtimeInfos wont be serialized and will not be received by browser(runtimeInfos wont get persisted in note.json),

Whereas if i add the modifier(current implementation), it will get persisted in JSON. But we dont want it, since it a run time information, and we might not want to persist in the note.json

Is there a way where i can serailize this info(for sending to client browser) as well as not persist in json?
PS: runtimeInfos object has the paragraph url information.

@karuppayya
Copy link
Contributor Author

Ready for review @Leemoonsoo

@1ambda
Copy link
Member

1ambda commented Nov 23, 2016

Let me review this PR and then give you some feedbacks. Thanks @karup1990!

  • I think Spark job button might be located on the left of the FINISHED button to keep consistency with other interpreter results which are not using spark. Maybe we can put it on the top left corner of a paragraph and other interpreters can use the location to give their interpreter specific buttons, options.
  • What about using uppercase chars like SPARK JOB because we are already using uppercases in FINISHED
  • If when I clicked hide output button on a paragraph, Spark job button doesn't disappear. What do you think? Should it be removed?

@karuppayya
Copy link
Contributor Author

Thanks @1ambda for looking into this PR and for the suggestions.
For the 3rd point,
I am not very sure of the use cases for hiding the output in a paragraph.
Do you see a case where the job link has to be hidden when output is hidden?

@1ambda
Copy link
Member

1ambda commented Nov 23, 2016

@karup1990 sorry for misspelling It was clear output

Method take;
String jobGroup = "zeppelin-" + interpreterContext.getParagraphId();
String jobGroup = "zeppelin-" + interpreterContext.getNoteId() + "-"
+ interpreterContext.getParagraphId();
Copy link
Contributor

@zjffdu zjffdu Nov 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see several places of constructing jobGroupId from noteId/paragraphId and extract noteId/paragraphId from jobGroupId. It's better to put them together so that others can understand it easily and other places can reuse it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added Utils. buildJobGroupId and using it across.

String id = interpreterGroup.getId();
int indexOfColon = id.indexOf(":");
return id.substring(0, indexOfColon);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assume the interpreterGroup id format, seems fragile.

Copy link
Contributor Author

@karuppayya karuppayya Nov 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing that interpreter setting is the first prefix, then based on the interpreter mode we are appending ids. Please correct me if I am wrong .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to put getInterpreterSettingId into Utils as well because it is about how we parse interpreterGroup

SparkInterpreter sparkInterpreter = getSparkInterpreter();
sparkInterpreter.populateSparkWebUrl(interpreterContext);

String jobGroup = sparkInterpreter.getJobGroup(interpreterContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to rename it sparkInterpreter.buildJobGroupId as each time it would create a new jobGroupId IIUC.

@zjffdu
Copy link
Contributor

zjffdu commented Nov 24, 2016

I left some comments inline and could you add test for this feature ?

<a href="{{paragraph.runtimeInfos.jobUrl[0]}}" target="_blank"><span class="fa fa-tasks"></span> Spark job </a>
</span>
<span class="dropdown" ng-show="paragraph.runtimeInfos.jobUrl.length > 1">
<span class="fa fa-tasks" style="cursor:pointer;color:#3071A9" tooltip-placement="top" tooltip="Run this paragraph (Shift+Enter)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tooltip-placement="top" tooltip="Run this paragraph (Shift+Enter)" doesn't need in here :)

@AhyoungRyu
Copy link
Contributor

Hi @karup1990, really nice and useful feature indeed.
While quickly looking through this, I have some minor feedbacks about UI :)

  1. Spark Job and Spark jobs have different font like below.
    screen shot 2016-11-24 at 3 07 53 pm
    screen shot 2016-11-24 at 3 08 04 pm

  2. And I think these block that you added in this PR should be below this line.

  <span ng-show="paragraph.runtimeInfos.jobUrl.length == 1">
    <a href="{{paragraph.runtimeInfos.jobUrl[0]}}" target="_blank"><span class="fa fa-tasks"></span> Spark Job </a>
  </span>
  <span class="dropdown" ng-show="paragraph.runtimeInfos.jobUrl.length > 1">
  <span class="fa fa-tasks" style="cursor:pointer;color:#3071A9" tooltip-placement="top" tooltip="Run this paragraph (Shift+Enter)"
        data-toggle="dropdown"
        type="button">  Spark Jobs
  </span>
    <ul class="dropdown-menu" role="menu" style="width:200px;z-index:1002">
       <li ng-class="{'option-disabled': !noteOperationsAllowed()}" ng-repeat="url in paragraph.runtimeInfos.jobUrl">
         <a href="{{url}}" target="_blank"><span class="fa fa-external-link-square"></span> Jobs #{{$index}}</a>
       </li>
    </ul>
  </span>
  • current
    before

  • after
    after

ng-click="runParagraph(getEditorValue())"
ng-show="paragraph.status!='RUNNING' && paragraph.status!='PENDING' && paragraph.config.enabled"></span>
<span ng-show="paragraph.runtimeInfos.jobUrl.length == 1">
<a href="{{paragraph.runtimeInfos.jobUrl[0]}}" target="_blank"><span class="fa fa-tasks"></span> Spark job </a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's nit pick, let's use capital letter for Spark job as below "Spark Jobs" does :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about display runtime info button before paragraph status and run button?

image

image

So all other other control buttons are stay in the same position whether runtimeInfo button is displayed or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And how about we generalize this feature little bit more, such as

runtimeInfos = {
   label: "Spark Job"
   jobUrl : [],
}

So other interpreters can also leverage this feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1ambda Clearing the links when we clear the output 👍
@AhyoungRyu Fixed font issue
@Leemoonsoo Added ParagraphRuntimeInfos class(https://github.com/apache/zeppelin/pull/1663/files#diff-a2ac97dfb3f07b2a3fd1533af701295e) to have a generic support to add more properties. Let me know how it looks.
@AhyoungRyu @Leemoonsoo @1ambda Fixed the alignment and capitalized the text(to access the jobs)
spark

@karuppayya
Copy link
Contributor Author

Thanks for the feedback. i will have them handled some time over the weekend.

@karuppayya
Copy link
Contributor Author

Failure seems not related
04:09:12,112 ERROR org.apache.zeppelin.AbstractZeppelinIT:136 - Exception in ZeppelinIT while testAngularDisplay org.openqa.selenium.TimeoutException: Timed out after 60 seconds waiting for org.apache.zeppelin.AbstractZeppelinIT$1@4c7305e4 Build info: version: '2.48.2', revision: '41bccdd10cf2c0560f637404c2d96164b67d9d67', time: '2015-10-09 13:08:06' System info: host: 'testing-docker-bcffa7c6-8593-4893-a72f-07a719f105fa', ip: '172.17.0.6', os.name: 'Linux', os.arch: 'amd64', os.version: '4.8.7-040807-generic', java.version: '1.7.0_76' Driver info: driver.version: unknown

@karuppayya
Copy link
Contributor Author

Updated the PR with test @zjffdu

@karuppayya
Copy link
Contributor Author

Ready for review

int secondIndex = jobgroupId.indexOf("-", indexOf + 1);
return jobgroupId.substring(secondIndex + 1, jobgroupId.length());
}
};
Copy link
Contributor

@zjffdu zjffdu Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to put getNoteId and getParagraphId to Utils ? So that the constructing and parsing logic are in the same class.


public void addValue(String value) {
if (values == null) {
values = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create values in the constructor ? Otherwise at least this method is not thread-safe (we might create 2 values in 2 threads)

Date dateUpdated;
private Map<String, Object> config; // paragraph configs like isOpen, colWidth, etc
public final GUI settings; // form and parameter settings
private Map<String, ParagraphRuntimeInfos> runtimeInfos;
Copy link
Contributor

@zjffdu zjffdu Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename ParagraphRuntimeInfos to ParagraphRuntimeInfo ? Because it is a little confusing that runtimeInfos is also plural

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (p.getDateFinished() != null && lastUpdatedDate.before(p.getDateFinished())) {
lastUpdatedDate = p.getDateFinished();
}
p.clearRuntimeInfo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to clear RuntimeInfo when we load note if the runtimeInfo is not stored ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will nedd to remove this line.
The runtime infos is persisted currently. But I dont want to persist it.
If i add transient modifier to runtimeInfos, when Gson.toJson(Paragraph) is done to send message about note/paragraph, the runtimeInfos wont be serialized and will not be received by browser(runtimeInfos wont get persisted in note.json),

Whereas if i add the modifier(current implementation), it will get persisted in JSON. But we dont want it, since it a run time information, and we might not want to persist in the note.json

Is there a way where i can serialize this info(for sending to client browser) as well as not persist in json?
PS: runtimeInfos object has the paragraph url information.


String propertyName;
String label;
String group;
Copy link
Contributor

@zjffdu zjffdu Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here the label and group are duplicated. I don't see you use label in frontend. Another approach I think of is that we always use interpreter group name as the prefix of propertyName as convention, (e.g. spark.jobUrl), so that we don't need label or group. @Leemoonsoo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The label is used here

{{paragraph.runtimeInfos.jobUrl.label}}

I had added group , so that two interpreters having same property name can be rendered in a different manner in UI.

Map<String, String> infos = new java.util.HashMap<>();
if (sparkUrl != null) {
infos.put("url", sparkUrl);
logger.info("Sending metainfos to Zeppelin server: {}", infos.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put the logging in the if block ? And what about else block ? Will that happen ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

public void clearParagraphRuntimeInfo(InterpreterSetting setting) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid this method may some issues because the binding between InterpreterSetting and Note/Paragraph may change. e.g. At the beginning I use spark in p1, but later I use flink in p1, but the method will clear all the runtime info of p1 when I restart spark interpreter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.. I have handled this now.

karuppayya and others added 17 commits February 2, 2017 13:42
Signed-off-by: karuppayya <karuppayya1990@gmail.com>
Signed-off-by: Karup <karuppayya@outlook.com>
Signed-off-by: Karup <karuppayya@outlook.com>
Signed-off-by: Karup <karuppayya@outlook.com>
Signed-off-by: Karup <karuppayya@outlook.com>
Signed-off-by: karuppayya <karuppayya1990@gmail.com>
Signed-off-by: karuppayya <karuppayya1990@gmail.com>
Signed-off-by: karuppayya <karuppayyar@qubole.com>
Signed-off-by: Karup <karuppayya@outlook.com>
Signed-off-by: Karup <karuppayya@outlook.com>
@karuppayya
Copy link
Contributor Author

Thanks @Leemoonsoo for the suggestion and the pointer.
Looks like all tested have passed. Ready to merge.

@Leemoonsoo
Copy link
Member

@karuppayya Great work!

LGTM and merge to master if no further discussions.

@asfgit asfgit closed this in e9caebc Feb 5, 2017
@Tagar
Copy link
Contributor

Tagar commented Mar 1, 2017

Just upgraded to yesterday's master snapshot.

When I click on any of these links, link is leading to

http://hostname.domain.com:8088/proxy/application_1488384993892_0001/jobs/job/

and that error shows HTTP ERROR 400

Page reads

Problem accessing /jobs/job/.

Reason: requirement failed: Missing id parameter
Powered by Jetty://

Thanks.

@karuppayya
Copy link
Contributor Author

@Tagar to access a specific job, the id has to be part of the url .
Something like http://hostname.domain.com:8088/proxy/application_1488384993892_0001/jobs/job?id=
I am not sure why the id is not part of the URL. Did you find anything suspicious in logs? Also were you able to access the jobs from Spark Web UI?
Is it consistent?

@mindcrusher11
Copy link

mindcrusher11 commented Mar 3, 2017 via email

@Tagar
Copy link
Contributor

Tagar commented Mar 3, 2017

@karuppayya thank you for the follow up.

A little bit more information - the link on paragprah leads to

http://10.20.32.57:28009/jobs/job?id=123

A Spark Driver UI.
Spark Driver web server then redirects to a link like I sent in my previous post,
and it misses job id in the redirected URL, i.e.

http://host.domain.com:8088/proxy/application_1488384993892_0044/jobs/job/

To your question on consistency - yes it happens in 100% cases, I never saw this new feature works for us.
Is this Spark version dependent or something? We're running CDH Spark 2.

@Tagar
Copy link
Contributor

Tagar commented Mar 6, 2017

According to apache/spark#5947 URL format is different in YARN and non-YARN modes? Was PR-1663 for ZEPPELIN-1692 tested on both of these modes? Not sure what else might break those links. We do use Spark on YARN. Created https://issues.apache.org/jira/browse/ZEPPELIN-2221 to move discussion there. Thank you.

beriaanirudh pushed a commit to beriaanirudh/zeppelin that referenced this pull request Mar 8, 2017
A paragraph execution may result in spark job(s).
Adding ability to access the spark job UI(corresponding to the job generated by the paragraph run), directly from the paragraph.

Improvement

* [x]  Write tests

ZEPPELIN-1692

Run paragraphs with spark code(scala, pyspark, sql, R).
The paragraph will display a button on the top right corner, which on click will open up the corresponding job UI

![spark_jobs](https://cloud.githubusercontent.com/assets/5082742/20488337/a07fe35a-b02c-11e6-9400-db9f1c10df90.gif)

* Does the licenses files need update? NA
* Is there breaking changes for older versions? NA
* Does this needs documentation? NA

Author: Karup <karuppayya@outlook.com>
Author: karuppayya <karuppayya1990@gmail.com>
Author: karuppayya <karuppayyar@qubole.com>

Closes apache#1663 from karuppayya/ZEPPELIN-1692 and squashes the following commits:

4253d0b [Karup] Fix bad rebase
d7eb3b6 [Karup] Fix paragraph.js
8e2cd85 [Karup] tryout: fix selenium tests based on moons suggstion
732b0a4 [karuppayya] Fix test
890107d [Karup] Fix test - tryout
ed4685c [Karup] Fix tooltip
d27221d [Karup] Adding license header
87214a7 [Karup] Fix incorrect rebase
19513a6 [Karup] Send para runtimeinfos via websocker, but dont persist in json
09fc0e2 [Karup] Fix compilation
fc44d9b [Karup] Address review comments
b837c6c [karuppayya] Fix incorrect variable used
42d92ac [karuppayya] Fix test
d4e54e8 [karuppayya] Address review feedbacks
1a45284 [Karup] Fix test
717eedf [Karup] Add tests , refactor
25379aa [Karup] Clear job urls when we clear output
7383c0a [Karup] Address review comments
e2cd4db [karuppayya] Fix NPE in tests
3d9a573 [karuppayya] Fix NPE and some refactoring
9b3a3e2 [karuppayya] Fix checkstyle
f16422f [karuppayya] Ability to view spark job urls in each paragraph

(cherry picked from commit e9caebc)
@Tagar
Copy link
Contributor

Tagar commented May 16, 2017

Does this feature work for anyone who is using Spark on YARN?

It seems to be broken by https://issues.apache.org/jira/browse/SPARK-20772

I've updated https://issues.apache.org/jira/browse/ZEPPELIN-2221

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants