Skip to content

Conversation

@beriaanirudh
Copy link

@beriaanirudh beriaanirudh commented Aug 4, 2016

What is this PR for?

There are 2 issues and their proposed fixes:

  1. On a paragraph run, for every line of output, there is a broadcast of the new line from zeppelin. In case of thousands of lines of output, the browser/s would hang because of the volume of these append-output events.
  2. In the above case, besides the browser-hang, another bug observed is that result data is will repeated twice (coming from append-output calls + finish-event calls).

The proposed solution for #1 is:

  • Buffer the append-output event into a queue instead of sending the event immediately.
  • In a separate thread, read from the queue periodically and send the append-output event.

Solution for #2 is:

  • Donot append output to result if the paragraph is not runnig.

What type of PR is it?

Improvement + Bug Fix

Todos

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-1292

How should this be tested?

The test could be to run a simple paragraph with large result. Eg:

%sh
for i in {1..10000}
do
echo $i
done

PS: One will need to clear browser cache between running with and without this code patch since there are javascript changes as well.

Screenshots (if appropriate)

Questions:

  • Does the licenses files need update?
    No
  • Is there breaking changes for older versions?
    No
  • Does this needs documentation?
    It could need for the design. Otherwise I have added code comments explaining behaviour.

* could be called in PENDING state as well
*/
if ($scope.paragraph.id === data.paragraphId &&
($scope.paragraph.status === 'RUNNING' || $scope.paragraph.status === 'PENDING')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the Comment above, why would we accept PENDING state?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I observed was that in between the pending and the running state, some (I observed 1) append-output events were called. If we miss those, then while the paragraph is running, we would see that some intial-result line/s is/are missing.
Also, when the para execution is finished, the complete note is sent again (with results), so we would see the correct result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the comment was a bit misleading, as it seems it was saying that it could be errorneously call es as well in PENDING state.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, when I re-read it sounds that way.. Re-phrased it.

+ " and send paragraph append data");
thread = new Thread(new AppendOutputRunner());
thread.start();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not thread-safe. You should use synchronized or something else

@jongyoul
Copy link
Member

jongyoul commented Aug 5, 2016

I think you'd better use ScheduledExecutorService by running separate thread because AppendOutputRunner always execute while(true) in your current implementation.

@jongyoul
Copy link
Member

jongyoul commented Aug 5, 2016

There're some Thread.sleep(1000) in your tests. In general, it would be nice and never be a blocker. It, however, occurs build error if CI become extremely slow. It also become a flaky test. Do you have any idea to make it more stable? And I have a question. Which part prevents updating output twice?

@beriaanirudh
Copy link
Author

It is easier to handle the twice output on the javascript side, so I have added the check in file paragraph.controller.js. The browser would not let results to be appended if the paragraph is completed. Also, since we are buffering events now, there would be few (1-2) events called after paragraph execution completion.
Working on your other comments..

1. Synchronize on AppendOutputRunner creation
2. Use ScheduledExecutorService instead of while loop
3. Remove Thread.sleep() from tests
@beriaanirudh
Copy link
Author

@jongyoul: I have incorported all feedback:

  1. Using ScheduledExecutorService for ensuring AppendOutputRunner is up, instead of doing so inside the while loop.
  2. Synchronizing on the thread check and creation.
  3. Removed all Thread.sleep() from unit tests except for one test "testClubbedData()" (which is basically a simulation of actual run). I am out of ideas for that one... Please advice.. We could remove that test as well..

prepareInvocationCounts(listener);
AppendOutputRunner.setListener(listener);
CheckAppendOutputRunner.startRunnerForUnitTests();
while(numInvocations != numTimes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you add Circuit Breaker with few seconds to avoid running infinitely.

Copy link
Author

@beriaanirudh beriaanirudh Aug 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.. Added a check for to fail the tests after 2 seconds.. Would have to add CircuitBreaker in dependencies, so just using System times.

@beriaanirudh
Copy link
Author

My tests needed some fixes to work with other tests. I have fixed them. But the build still has some errors I think not related to my change. https://travis-ci.org/apache/zeppelin/builds/150876902 .

@beriaanirudh
Copy link
Author

@jongyoul @corneadoug I have incorporated all feedback, and added comments on queries. This PR is ready for review.

@jongyoul
Copy link
Member

I've a few comment on your comments. Please check it.

@corneadoug
Copy link
Contributor

Tested a few cases to check that the front-end still handle most of the cases, and its good

private static final Long SAFE_PROCESSING_STRING_SIZE = new Long(100000);

private static final BlockingQueue<AppendOutputBuffer> QUEUE =
new LinkedBlockingQueue<AppendOutputBuffer>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dynamic data structure does not feel like a constant, so accouding to the project styleguide#s5.2.4-constant-names) better be rather named queue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done..

@beriaanirudh
Copy link
Author

@corneadoug thanks for verifying.
@jongyoul Got it fixed now. I had mis-understood previously.

@beriaanirudh
Copy link
Author

@corneadoug @jongyoul @bzz this is ready for review.

public void run() {

Map<String, Map<String, StringBuilder> > noteMap =
new HashMap<String, Map<String, StringBuilder> >();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a nitpick, but 'diamond operator' should come handy

Map<String, Map<String, StringBuilder> > noteMap = new HashMap<>();

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done..

@beriaanirudh
Copy link
Author

@corneadoug @jongyoul @bzz this is ready for review.

@jongyoul
Copy link
Member

@beriaanirudh You need to change more where you don't use 'diamond operator'. Can you fix them?

@beriaanirudh
Copy link
Author

@jongyoul done. Used diamond operator at all places except for 1 where compiler wouldn't allow (AppendOutputRunner.java:78)

@bzz
Copy link
Member

bzz commented Aug 24, 2016

@beriaanirudh thank you for prompt fixes!

@vipul1409
Copy link

+1 Looking forward to this

@jongyoul
Copy link
Member

LGTM

@beriaanirudh
Copy link
Author

@corneadoug @bzz this is ready for review...

@Leemoonsoo
Copy link
Member

Text output streaming used to work in following way (branch-0.6)
pr1283_before

But somehow, this PR (and master branch) works like
pr1283_after

Shell we address this problem first, before apply this patch?

@corneadoug
Copy link
Contributor

@Leemoonsoo

The behaviour is still the same with:

(1 to 40).foreach{ i=>
    Thread.sleep(1000)
    println("Hello " + i)
}

@Leemoonsoo
Copy link
Member

Leemoonsoo commented Aug 26, 2016

@corneadoug yes, that streams output every one seconds if i try the master branch.

However, I did some more experiments and found

Output streaming : master branch (X), this pullrequest (X)

%sh
for i in {1..3}
do
    date
    sleep 2
done

Output streaming : master branch (O), this pullrequest (X)

%spark
(1 to 3).foreach{i=>
    Thread.sleep(2000)
    println(new java.util.Date)
}

@corneadoug @beriaanirudh Could you verify these cases, too?

@corneadoug
Copy link
Contributor

I tested both cases:
Case1: master (X), this branch (X)
Case2: master (O), this branch(X)

However the first time I tried this branch and commented, the Case 2 was fine in this Branch

@Leemoonsoo
Copy link
Member

According to the test results,

%sh interpreter output streaming might be other issue.
but it looks like %spark interpreter output streaming in this branch is not working as expected.

@beriaanirudh Could you take care of it?

@beriaanirudh
Copy link
Author

Hey,
I had done the testing on streaming cases for shell and spark earlier while raising this PR, and they did work then. I just did the testing on top of d11221fb8af5568416ef5041fc2da8b6fa08598b (currently latest master) and I found these results:
Case1: master (X), this branch (X)
Case2: master(O), this branch (O)

I think SparkInterpreter could have been broken which got fixed today in 5f1208bd (ZEPPELIN-1284). Perhaps thats why Case2 was failing 2-3 days back. Could you please confirm my test-results?
cc @Leemoonsoo @corneadoug

@corneadoug
Copy link
Contributor

Don't know why sometimes we get some failure in that branch,
Tried again and got:

Case1: master (X), this branch (X)
Case2: master (O), this branch(O)

@beriaanirudh
Copy link
Author

Thanks for the verification @corneadoug . Even I am not able to reproduce it. My best guess is that it had to do something with the fix I mentioned above.

@beriaanirudh
Copy link
Author

@corneadoug @Leemoonsoo please let me know if any action-item on my side.

@Leemoonsoo
Copy link
Member

Tested again and got Case2 works fine on this branch.
LGTM. CI failure is irrelevant to this change.
Merge if there're no more discussions.

Thanks @beriaanirudh for the contribution!

@asfgit asfgit closed this in 11becde Sep 3, 2016
beriaanirudh pushed a commit to beriaanirudh/zeppelin that referenced this pull request Mar 8, 2017
There are 2 issues and their proposed fixes:
1. On a paragraph run, for every line of output, there is a broadcast of the new line from zeppelin. In case of thousands of lines of output, the browser/s would hang because of the volume of these append-output events.
2. In the above case, besides the browser-hang, another bug observed is that result data is will repeated twice (coming from append-output calls + finish-event calls).

The proposed solution for apache#1 is:
- Buffer the append-output event into a queue instead of sending the event immediately.
- In a separate thread, read from the queue periodically and send the append-output event.

Solution for apache#2 is:
- Donot append output to result if the paragraph is not runnig.

Improvement + Bug Fix

https://issues.apache.org/jira/browse/ZEPPELIN-1292

The test could be to run a simple paragraph with large result. Eg:
```
%sh
for i in {1..10000}
do
echo $i
done
```
PS: One will need to clear browser cache between running with and without this code patch since there are javascript changes as well.

* Does the licenses files need update?
No
* Is there breaking changes for older versions?
No
* Does this needs documentation?
It could need for the design. Otherwise I have added code comments explaining behaviour.

Author: Beria <beria@qubole.com>

Closes apache#1283 from beriaanirudh/ZEPPELIN-1292 and squashes the following commits:

17f0524 [Beria] Use diamond operator
7852368 [Beria] nit
4b68c86 [Beria] fix checkstyle
d168614 [Beria] Remove un-necessary class CheckAppendOutputRunner
2eae38e [Beria] Make AppendOutputRunner non-static
72c316d [Beria] Scheduler service to replace while loop in AppendOutputRunner
599281f [Beria] fix unit tests that run after
dd24816 [Beria] Add license in test file
3984ef8 [Beria] fix tests when ran with other tests
1c893c0 [Beria] Add licensing
1bdd669 [Beria] fix javadoc comment
27790e4 [Beria] Avoid infinite loop in tests
5057bb3 [Beria] Incorporate feedback 1. Synchronize on AppendOutputRunner creation 2. Use ScheduledExecutorService instead of while loop 3. Remove Thread.sleep() from tests
82e9c4a [Beria] Fix comment
7020f0c [Beria] Buffer append output results + fix extra incorrect results

(cherry picked from commit 11becde)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants