Skip to content

Conversation

@zjffdu
Copy link
Contributor

@zjffdu zjffdu commented Sep 30, 2016

What is this PR for?

Based on #338 , I refactor most of pig interpreter. As I don't think the approach in #338 is the best approach. In #338, we use script bin/pig to launch pig script, it is different to control that job (hard to kill and get progress and stats info). In this PR, I use pig api to launch pig script. Besides that I implement another interpreter type %pig.query to leverage the display system of zeppelin. For the details you can check pig.md

What type of PR is it?

[Feature]

Todos

  • Syntax Highlight
  • new interpreter type %pig.udf, so that user can write pig udf in zeppelin directly and don't need to build udf jar manually.

What is the Jira issue?

How should this be tested?

Unit test is added and also manual test is done

Screenshots (if appropriate)

image

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

@zjffdu zjffdu force-pushed the ZEPPELIN-335 branch 2 times, most recently from f74280e to 908574f Compare September 30, 2016 09:11
Launcher launcher = (Launcher) launcherField.get(engine);
// It doesn't work for Tez Engine due to PIG-5035
launcher.killJob(jobId, new Configuration());
} catch (NoSuchFieldException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add message string or merge like this?

catch (NoSuchFieldException | BackendException | IllegalAccessException e)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@zjffdu
Copy link
Contributor Author

zjffdu commented Oct 7, 2016

@zjffdu zjffdu closed this Oct 7, 2016
@zjffdu zjffdu reopened this Oct 7, 2016
@felixcheung
Copy link
Member

Is it possible have a non-Tez version of Pig and would it work with this interpreter?

@zjffdu
Copy link
Contributor Author

zjffdu commented Oct 7, 2016

Yes, it supports local, mapreduce and tez engine.

if (!fe.getMessage().contains("Backend error :")) {
// If the error message contains "Backend error :", that means the exception is from
// backend.
return new InterpreterResult(Code.ERROR, ExceptionUtils.getStackTrace(e));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like pretty bad user experience to expose call stack in paragraph run result? should we just log in and return e.getMessage() instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.getMessage() doesn't contain useful info for diagnosis. e.g. if you specify a nonexist path, e.getMessage() only get error message Unable to open iterator for alias c which is not useful for users. And in pig grunt (pig interactive tool), user can also see the full stacktrace, so I think it is acceptable to display full stack trace here.

Copy link
Member

@felixcheung felixcheung Oct 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about it but there has been numerous request to hide exception stack in interpreter results in other interpreters and we generally do not put exception stack in interpreter results.

if (stats != null) {
String errorMsg = PigUtils.extactJobStats(stats);
if (errorMsg != null) {
LOGGER.debug("Error Message:" + errorMsg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOGGER.error("message, e) instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

jobIds.add(js.getJobId());
}
return jobIds;
} catch (Exception e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOGGER.error?

}
},
"editor": {
"language": "pig"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find it, does https://highlightjs.org/ supports "pig"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it doesn't support, this is in TODO list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you plan to add highlighting to the highlightjs project, since that's what we are using.

@felixcheung
Copy link
Member

felixcheung commented Oct 8, 2016

Could you add documentation?

also I think LICENSE file should be updated re: pig and dependencies

@zjffdu
Copy link
Contributor Author

zjffdu commented Oct 8, 2016

@felixcheung please check pig.md

abajwa-hw and others added 5 commits October 8, 2016 16:13
1. Documentation: added pig.md with interpreter documentation and added pig entry to index.md
2. Added test junit test based on passwd file parsing example here https://pig.apache.org/docs/r0.10.0/start.html#run
3. Removed author tag from comment (this was copied from shell interpreter https://github.com/apache/incubator-zeppelin/blob/master/shell/src/main/java/org/apache/zeppelin/shell/ShellInterpreter.java#L42)
4. Implemented cancel functionality
5. Display output stream in case of error
@zjffdu zjffdu force-pushed the ZEPPELIN-335 branch 2 times, most recently from 9185339 to 8fc5d6f Compare October 8, 2016 08:34
[ZEPPELIN-335][DOCS] Minor update for pig.md
@zjffdu
Copy link
Contributor Author

zjffdu commented Oct 9, 2016

Thanks @AhyoungRyu for the help, merged.

</tr>
<tr>
<td>zeppelin.pig.maxResult</td>
<td>20</td>
Copy link
Member

@felixcheung felixcheung Oct 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bit low at 20, since there isn't a way to retrieve the full result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change it to 1000 as the same of spark sql

Copy link
Member

@felixcheung felixcheung Oct 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you update the doc then on the default value?

fi

# autodetect TEZ_CONF_DIR
TEZ_CONF_DIR = ${TEZ_CONF_DIR:=/etc/tez/conf}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check that /etc/tez/conf exists before adding to the classpath?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
try {
pigServer = new PigServer(execType);
} catch (IOException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you LOGGER.error even when the exception is rethrown? it's easier to see everything in the log file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
}
if (!outputBuilder.toString().isEmpty() || !bytesOutput.toString().isEmpty()) {
outputBuilder.append("------------- Pig Output --------------\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we have a "header" in other interpreter output - is there a reason this could be useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

PigScriptListener scriptListener = new PigScriptListener();
ScriptState.get().registerListener(scriptListener);
listenerMap.put(contextInterpreter.getParagraphId(), scriptListener);
pigServer.registerScript(tmpFile.getAbsolutePath());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does registerScript block until execution is complete? otherwise it seems we are getting result output and deleting temp script file early

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a block api.


@Override
public InterpreterResult interpret(String st, InterpreterContext context) {
String alias = "paragraph_" + context.getParagraphId().replace("-", "_");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a comment on why this is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// Extract error in the following order
// 1. catch FrontendException, FrontendException happens in the query compilation phase.
// 2. PigStats, This is execution error
// 3. Other errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOGGER.error when e is not FrontendException

PigStats.JobGraph jobPlan = (PigStats.JobGraph) jobPlanField.get(stats);

if (stats.getReturnCode() == PigRunner.ReturnCode.SUCCESS
|| stats.getReturnCode() == PigRunner.ReturnCode.PARTIAL_FAILURE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want stats.getReturnCode() == PigRunner.ReturnCode.PARTIAL_FAILURE here?

Copy link
Contributor Author

@zjffdu zjffdu Oct 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, these code are copied from pig as it is private in pig, PIG-5037 is for exposing such api.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just that stats.getReturnCode() == PigRunner.ReturnCode.PARTIAL_FAILURE is in both the "SUCCESS" and "FAILURE" cases. So if I understand correctly, if return code is PARTIAL_FAILURE, it will say "Job Stats" and "Failed Jobs" at the same time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, if it is PARTIAL_FAILURE, it would display the job stats of both succeeded jobs and failed jobs. e.g. one pig script needs to run 2 mapreduce jobs, and one job is successful and another is failed.

@felixcheung
Copy link
Member

how about LICENSE file?

@zjffdu
Copy link
Contributor Author

zjffdu commented Oct 10, 2016

@felixcheung Thanks for the review, license is added and comments are addressed.

@felixcheung
Copy link
Member

Thanks, comment replied, in particular, this

@felixcheung
Copy link
Member

felixcheung commented Oct 10, 2016

Do you intent to include the TODO in this PR, or as a follow up?
Otherwise I think this is good to go - you can remove the WIP in the title and I'll merge if there is no more comment.

SELENIUM tests seem to be failing consistently, could you check it out?

@zjffdu zjffdu changed the title [WIP] ZEPPELIN-335. Pig Interpreter ZEPPELIN-335. Pig Interpreter Oct 11, 2016
elasticsearch org.apache.zeppelin:zeppelin-elasticsearch:0.6.1 Elasticsearch interpreter
file org.apache.zeppelin:zeppelin-file:0.6.1 HDFS file interpreter
flink org.apache.zeppelin:zeppelin-flink_2.11:0.6.1 Flink interpreter built with Scala 2.11
pig org.apache.zeppelin:zeppelin-pig:0.6.1 Pig interpreter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjffdu Could you put this in alphabetic order? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@zjffdu
Copy link
Contributor Author

zjffdu commented Oct 11, 2016

Thanks @felixcheung , Do you know how to run SELENIUM test ? I follow the instruction on READM.MD, but fails

Zeppelin comes with a set of end-to-end acceptance tests driving headless selenium browser

```sh
# assumes zeppelin-server running on localhost:8080 (use -Durl=.. to override)
mvn verify

# or take care of starting/stoping zeppelin-server from packaged zeppelin-distribuion/target
mvn verify -P using-packaged-distr

@felixcheung
Copy link
Member

@zjffdu
Copy link
Contributor Author

zjffdu commented Oct 11, 2016

I did it, but fails with the following error:

*** RUN ABORTED ***
  java.lang.RuntimeException: Unable to load a Suite class that was discovered in the runpath: org.apache.zeppelin.AbstractFunctionalSuite
  at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:84)
  at org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:38)
  at org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:37)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at scala.collection.Iterator$class.foreach(Iterator.scala:727)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
  at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
  ...
  Cause: java.lang.RuntimeException: Could not initialize any driver
  at org.apache.zeppelin.AbstractFunctionalSuite.getDriver(AbstractFunctionalSuite.scala:64)
  at org.apache.zeppelin.AbstractFunctionalSuite.<init>(AbstractFunctionalSuite.scala:39)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
  at java.lang.Class.newInstance(Class.java:442)
  at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:69)
  at org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:38)
  at org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:37)

@Leemoonsoo @bzz Do you know how to run SELENIUM test locally ? Thanks

@zjffdu zjffdu closed this Oct 12, 2016
@zjffdu zjffdu reopened this Oct 12, 2016
@zjffdu zjffdu closed this Oct 12, 2016
@zjffdu zjffdu reopened this Oct 12, 2016
@zjffdu
Copy link
Contributor Author

zjffdu commented Oct 12, 2016

@felixcheung Test is passed, I suspect the selenium test is flaky so failed last time.

@abajwa-hw
Copy link

@zjffdu thanks for working on this interpreter! Looks great

@felixcheung
Copy link
Member

felixcheung commented Oct 12, 2016 via email

@zjffdu
Copy link
Contributor Author

zjffdu commented Oct 13, 2016

@felixcheung @AhyoungRyu 's comment is addressed.

@felixcheung
Copy link
Member

merging if no more comment

@asfgit asfgit closed this in 465c51a Oct 15, 2016
darionyaphet pushed a commit to darionyaphet/zeppelin that referenced this pull request Oct 27, 2016
### What is this PR for?
Based on apache#338 , I refactor most of pig interpreter. As I don't think the approach in apache#338 is the best approach. In apache#338, we use script `bin/pig` to launch pig script, it is different to control that job (hard to kill and get progress and stats info).  In this PR, I use pig api to launch pig script. Besides that I implement another interpreter type `%pig.query` to leverage the display system of zeppelin. For the details you can check `pig.md`

### What type of PR is it?
[Feature]

### Todos
* Syntax Highlight
* new interpreter type `%pig.udf`, so that user can write pig udf in zeppelin directly and don't need to build udf jar manually.

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-335

### How should this be tested?
Unit test is added and also manual test is done

### Screenshots (if appropriate)

![image](https://cloud.githubusercontent.com/assets/164491/18986649/54217b4c-8730-11e6-9e33-25f98a98a9b6.png)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <zjffdu@apache.org>
Author: Ali Bajwa <abajwa@hortonworks.com>
Author: AhyoungRyu <ahyoungryu@apache.org>
Author: Jeff Zhang <zjffdu@gmail.com>

Closes apache#1476 from zjffdu/ZEPPELIN-335 and squashes the following commits:

73a07f0 [Jeff Zhang] minor update
a1b742b [Jeff Zhang] minor update on doc
e858301 [Jeff Zhang] address comments
c85a090 [Jeff Zhang] add license
58b4b2f [Jeff Zhang] minor update of docs
1ae7db2 [Jeff Zhang] Merge pull request apache#2 from AhyoungRyu/ZEPPELIN-335/docs
fe014a7 [AhyoungRyu] Fix docs title in front matter
df7a6db [AhyoungRyu] Add pig.md to dropdown menu
5e2e222 [AhyoungRyu] Minor update for pig.md
39f161a [Jeff Zhang] address comments
05a3b9b [Jeff Zhang] add pig.md
a09a7f7 [Jeff Zhang] refactor pig Interpreter
c28beb5 [Ali Bajwa] Updated based on comments: 1. Documentation: added pig.md with interpreter documentation and added pig entry to index.md 2. Added test junit test based on passwd file parsing example here https://pig.apache.org/docs/r0.10.0/start.html#run 3. Removed author tag from comment (this was copied from shell interpreter https://github.com/apache/incubator-zeppelin/blob/master/shell/src/main/java/org/apache/zeppelin/shell/ShellInterpreter.java#L42) 4. Implemented cancel functionality 5. Display output stream in case of error
2586336 [Ali Bajwa] exposed timeout and pig executable via interpreter and added comments
7abad20 [Ali Bajwa] initial commit of pig interpreter
pedrozatta pushed a commit to pedrozatta/zeppelin that referenced this pull request Oct 27, 2016
### What is this PR for?
Based on apache#338 , I refactor most of pig interpreter. As I don't think the approach in apache#338 is the best approach. In apache#338, we use script `bin/pig` to launch pig script, it is different to control that job (hard to kill and get progress and stats info).  In this PR, I use pig api to launch pig script. Besides that I implement another interpreter type `%pig.query` to leverage the display system of zeppelin. For the details you can check `pig.md`

### What type of PR is it?
[Feature]

### Todos
* Syntax Highlight
* new interpreter type `%pig.udf`, so that user can write pig udf in zeppelin directly and don't need to build udf jar manually.

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-335

### How should this be tested?
Unit test is added and also manual test is done

### Screenshots (if appropriate)

![image](https://cloud.githubusercontent.com/assets/164491/18986649/54217b4c-8730-11e6-9e33-25f98a98a9b6.png)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <zjffdu@apache.org>
Author: Ali Bajwa <abajwa@hortonworks.com>
Author: AhyoungRyu <ahyoungryu@apache.org>
Author: Jeff Zhang <zjffdu@gmail.com>

Closes apache#1476 from zjffdu/ZEPPELIN-335 and squashes the following commits:

73a07f0 [Jeff Zhang] minor update
a1b742b [Jeff Zhang] minor update on doc
e858301 [Jeff Zhang] address comments
c85a090 [Jeff Zhang] add license
58b4b2f [Jeff Zhang] minor update of docs
1ae7db2 [Jeff Zhang] Merge pull request apache#2 from AhyoungRyu/ZEPPELIN-335/docs
fe014a7 [AhyoungRyu] Fix docs title in front matter
df7a6db [AhyoungRyu] Add pig.md to dropdown menu
5e2e222 [AhyoungRyu] Minor update for pig.md
39f161a [Jeff Zhang] address comments
05a3b9b [Jeff Zhang] add pig.md
a09a7f7 [Jeff Zhang] refactor pig Interpreter
c28beb5 [Ali Bajwa] Updated based on comments: 1. Documentation: added pig.md with interpreter documentation and added pig entry to index.md 2. Added test junit test based on passwd file parsing example here https://pig.apache.org/docs/r0.10.0/start.html#run 3. Removed author tag from comment (this was copied from shell interpreter https://github.com/apache/incubator-zeppelin/blob/master/shell/src/main/java/org/apache/zeppelin/shell/ShellInterpreter.java#L42) 4. Implemented cancel functionality 5. Display output stream in case of error
2586336 [Ali Bajwa] exposed timeout and pig executable via interpreter and added comments
7abad20 [Ali Bajwa] initial commit of pig interpreter
asfgit pushed a commit that referenced this pull request Nov 2, 2016
Closes #338 (fixed by #1476)
Closes #1522 (fixed by #1559)
Closes #1527 (fixed by #1511)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants