-
Notifications
You must be signed in to change notification settings - Fork 2.8k
ZEPPELIN-335. Pig Interpreter #1476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f74280e to
908574f
Compare
908574f to
5851f2b
Compare
| Launcher launcher = (Launcher) launcherField.get(engine); | ||
| // It doesn't work for Tez Engine due to PIG-5035 | ||
| launcher.killJob(jobId, new Configuration()); | ||
| } catch (NoSuchFieldException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add message string or merge like this?
catch (NoSuchFieldException | BackendException | IllegalAccessException e)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
ping @Leemoonsoo @bzz @felixcheung @abajwa-hw @vgmartinez @maparco @smartinsightsfromdata please help review. |
|
Is it possible have a non-Tez version of Pig and would it work with this interpreter? |
|
Yes, it supports local, mapreduce and tez engine. |
| if (!fe.getMessage().contains("Backend error :")) { | ||
| // If the error message contains "Backend error :", that means the exception is from | ||
| // backend. | ||
| return new InterpreterResult(Code.ERROR, ExceptionUtils.getStackTrace(e)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like pretty bad user experience to expose call stack in paragraph run result? should we just log in and return e.getMessage() instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.getMessage() doesn't contain useful info for diagnosis. e.g. if you specify a nonexist path, e.getMessage() only get error message Unable to open iterator for alias c which is not useful for users. And in pig grunt (pig interactive tool), user can also see the full stacktrace, so I think it is acceptable to display full stack trace here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about it but there has been numerous request to hide exception stack in interpreter results in other interpreters and we generally do not put exception stack in interpreter results.
| if (stats != null) { | ||
| String errorMsg = PigUtils.extactJobStats(stats); | ||
| if (errorMsg != null) { | ||
| LOGGER.debug("Error Message:" + errorMsg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOGGER.error("message, e) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| jobIds.add(js.getJobId()); | ||
| } | ||
| return jobIds; | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOGGER.error?
| } | ||
| }, | ||
| "editor": { | ||
| "language": "pig" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find it, does https://highlightjs.org/ supports "pig"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it doesn't support, this is in TODO list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you plan to add highlighting to the highlightjs project, since that's what we are using.
|
Could you add documentation? also I think LICENSE file should be updated re: pig and dependencies |
|
@felixcheung please check |
1. Documentation: added pig.md with interpreter documentation and added pig entry to index.md 2. Added test junit test based on passwd file parsing example here https://pig.apache.org/docs/r0.10.0/start.html#run 3. Removed author tag from comment (this was copied from shell interpreter https://github.com/apache/incubator-zeppelin/blob/master/shell/src/main/java/org/apache/zeppelin/shell/ShellInterpreter.java#L42) 4. Implemented cancel functionality 5. Display output stream in case of error
9185339 to
8fc5d6f
Compare
[ZEPPELIN-335][DOCS] Minor update for pig.md
|
Thanks @AhyoungRyu for the help, merged. |
docs/interpreter/pig.md
Outdated
| </tr> | ||
| <tr> | ||
| <td>zeppelin.pig.maxResult</td> | ||
| <td>20</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a bit low at 20, since there isn't a way to retrieve the full result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change it to 1000 as the same of spark sql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you update the doc then on the default value?
bin/interpreter.sh
Outdated
| fi | ||
|
|
||
| # autodetect TEZ_CONF_DIR | ||
| TEZ_CONF_DIR = ${TEZ_CONF_DIR:=/etc/tez/conf} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check that /etc/tez/conf exists before adding to the classpath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| } | ||
| try { | ||
| pigServer = new PigServer(execType); | ||
| } catch (IOException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you LOGGER.error even when the exception is rethrown? it's easier to see everything in the log file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| } | ||
| } | ||
| if (!outputBuilder.toString().isEmpty() || !bytesOutput.toString().isEmpty()) { | ||
| outputBuilder.append("------------- Pig Output --------------\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we have a "header" in other interpreter output - is there a reason this could be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
| PigScriptListener scriptListener = new PigScriptListener(); | ||
| ScriptState.get().registerListener(scriptListener); | ||
| listenerMap.put(contextInterpreter.getParagraphId(), scriptListener); | ||
| pigServer.registerScript(tmpFile.getAbsolutePath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does registerScript block until execution is complete? otherwise it seems we are getting result output and deleting temp script file early
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is a block api.
|
|
||
| @Override | ||
| public InterpreterResult interpret(String st, InterpreterContext context) { | ||
| String alias = "paragraph_" + context.getParagraphId().replace("-", "_"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a comment on why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| // Extract error in the following order | ||
| // 1. catch FrontendException, FrontendException happens in the query compilation phase. | ||
| // 2. PigStats, This is execution error | ||
| // 3. Other errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOGGER.error when e is not FrontendException
| PigStats.JobGraph jobPlan = (PigStats.JobGraph) jobPlanField.get(stats); | ||
|
|
||
| if (stats.getReturnCode() == PigRunner.ReturnCode.SUCCESS | ||
| || stats.getReturnCode() == PigRunner.ReturnCode.PARTIAL_FAILURE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want stats.getReturnCode() == PigRunner.ReturnCode.PARTIAL_FAILURE here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, these code are copied from pig as it is private in pig, PIG-5037 is for exposing such api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just that stats.getReturnCode() == PigRunner.ReturnCode.PARTIAL_FAILURE is in both the "SUCCESS" and "FAILURE" cases. So if I understand correctly, if return code is PARTIAL_FAILURE, it will say "Job Stats" and "Failed Jobs" at the same time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, if it is PARTIAL_FAILURE, it would display the job stats of both succeeded jobs and failed jobs. e.g. one pig script needs to run 2 mapreduce jobs, and one job is successful and another is failed.
|
how about LICENSE file? |
|
@felixcheung Thanks for the review, license is added and comments are addressed. |
|
Thanks, comment replied, in particular, this |
|
Do you intent to include the TODO in this PR, or as a follow up? SELENIUM tests seem to be failing consistently, could you check it out? |
conf/interpreter-list
Outdated
| elasticsearch org.apache.zeppelin:zeppelin-elasticsearch:0.6.1 Elasticsearch interpreter | ||
| file org.apache.zeppelin:zeppelin-file:0.6.1 HDFS file interpreter | ||
| flink org.apache.zeppelin:zeppelin-flink_2.11:0.6.1 Flink interpreter built with Scala 2.11 | ||
| pig org.apache.zeppelin:zeppelin-pig:0.6.1 Pig interpreter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zjffdu Could you put this in alphabetic order? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
Thanks @felixcheung , Do you know how to run SELENIUM test ? I follow the instruction on READM.MD, but fails |
|
Try the commandline from Travis? |
|
I did it, but fails with the following error: @Leemoonsoo @bzz Do you know how to run SELENIUM test locally ? Thanks |
|
@felixcheung Test is passed, I suspect the selenium test is flaky so failed last time. |
|
@zjffdu thanks for working on this interpreter! Looks great |
|
I think @AhyoungRyu has a comment?
|
|
@felixcheung @AhyoungRyu 's comment is addressed. |
|
merging if no more comment |
### What is this PR for? Based on apache#338 , I refactor most of pig interpreter. As I don't think the approach in apache#338 is the best approach. In apache#338, we use script `bin/pig` to launch pig script, it is different to control that job (hard to kill and get progress and stats info). In this PR, I use pig api to launch pig script. Besides that I implement another interpreter type `%pig.query` to leverage the display system of zeppelin. For the details you can check `pig.md` ### What type of PR is it? [Feature] ### Todos * Syntax Highlight * new interpreter type `%pig.udf`, so that user can write pig udf in zeppelin directly and don't need to build udf jar manually. ### What is the Jira issue? * https://issues.apache.org/jira/browse/ZEPPELIN-335 ### How should this be tested? Unit test is added and also manual test is done ### Screenshots (if appropriate)  ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <zjffdu@apache.org> Author: Ali Bajwa <abajwa@hortonworks.com> Author: AhyoungRyu <ahyoungryu@apache.org> Author: Jeff Zhang <zjffdu@gmail.com> Closes apache#1476 from zjffdu/ZEPPELIN-335 and squashes the following commits: 73a07f0 [Jeff Zhang] minor update a1b742b [Jeff Zhang] minor update on doc e858301 [Jeff Zhang] address comments c85a090 [Jeff Zhang] add license 58b4b2f [Jeff Zhang] minor update of docs 1ae7db2 [Jeff Zhang] Merge pull request apache#2 from AhyoungRyu/ZEPPELIN-335/docs fe014a7 [AhyoungRyu] Fix docs title in front matter df7a6db [AhyoungRyu] Add pig.md to dropdown menu 5e2e222 [AhyoungRyu] Minor update for pig.md 39f161a [Jeff Zhang] address comments 05a3b9b [Jeff Zhang] add pig.md a09a7f7 [Jeff Zhang] refactor pig Interpreter c28beb5 [Ali Bajwa] Updated based on comments: 1. Documentation: added pig.md with interpreter documentation and added pig entry to index.md 2. Added test junit test based on passwd file parsing example here https://pig.apache.org/docs/r0.10.0/start.html#run 3. Removed author tag from comment (this was copied from shell interpreter https://github.com/apache/incubator-zeppelin/blob/master/shell/src/main/java/org/apache/zeppelin/shell/ShellInterpreter.java#L42) 4. Implemented cancel functionality 5. Display output stream in case of error 2586336 [Ali Bajwa] exposed timeout and pig executable via interpreter and added comments 7abad20 [Ali Bajwa] initial commit of pig interpreter
### What is this PR for? Based on apache#338 , I refactor most of pig interpreter. As I don't think the approach in apache#338 is the best approach. In apache#338, we use script `bin/pig` to launch pig script, it is different to control that job (hard to kill and get progress and stats info). In this PR, I use pig api to launch pig script. Besides that I implement another interpreter type `%pig.query` to leverage the display system of zeppelin. For the details you can check `pig.md` ### What type of PR is it? [Feature] ### Todos * Syntax Highlight * new interpreter type `%pig.udf`, so that user can write pig udf in zeppelin directly and don't need to build udf jar manually. ### What is the Jira issue? * https://issues.apache.org/jira/browse/ZEPPELIN-335 ### How should this be tested? Unit test is added and also manual test is done ### Screenshots (if appropriate)  ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <zjffdu@apache.org> Author: Ali Bajwa <abajwa@hortonworks.com> Author: AhyoungRyu <ahyoungryu@apache.org> Author: Jeff Zhang <zjffdu@gmail.com> Closes apache#1476 from zjffdu/ZEPPELIN-335 and squashes the following commits: 73a07f0 [Jeff Zhang] minor update a1b742b [Jeff Zhang] minor update on doc e858301 [Jeff Zhang] address comments c85a090 [Jeff Zhang] add license 58b4b2f [Jeff Zhang] minor update of docs 1ae7db2 [Jeff Zhang] Merge pull request apache#2 from AhyoungRyu/ZEPPELIN-335/docs fe014a7 [AhyoungRyu] Fix docs title in front matter df7a6db [AhyoungRyu] Add pig.md to dropdown menu 5e2e222 [AhyoungRyu] Minor update for pig.md 39f161a [Jeff Zhang] address comments 05a3b9b [Jeff Zhang] add pig.md a09a7f7 [Jeff Zhang] refactor pig Interpreter c28beb5 [Ali Bajwa] Updated based on comments: 1. Documentation: added pig.md with interpreter documentation and added pig entry to index.md 2. Added test junit test based on passwd file parsing example here https://pig.apache.org/docs/r0.10.0/start.html#run 3. Removed author tag from comment (this was copied from shell interpreter https://github.com/apache/incubator-zeppelin/blob/master/shell/src/main/java/org/apache/zeppelin/shell/ShellInterpreter.java#L42) 4. Implemented cancel functionality 5. Display output stream in case of error 2586336 [Ali Bajwa] exposed timeout and pig executable via interpreter and added comments 7abad20 [Ali Bajwa] initial commit of pig interpreter
What is this PR for?
Based on #338 , I refactor most of pig interpreter. As I don't think the approach in #338 is the best approach. In #338, we use script
bin/pigto launch pig script, it is different to control that job (hard to kill and get progress and stats info). In this PR, I use pig api to launch pig script. Besides that I implement another interpreter type%pig.queryto leverage the display system of zeppelin. For the details you can checkpig.mdWhat type of PR is it?
[Feature]
Todos
%pig.udf, so that user can write pig udf in zeppelin directly and don't need to build udf jar manually.What is the Jira issue?
How should this be tested?
Unit test is added and also manual test is done
Screenshots (if appropriate)
Questions: