Skip to content

Conversation

@roshannaik
Copy link
Contributor

updating setting in defaults.yaml

@abhishekagarwal87
Copy link
Contributor

The actual event logging happens only if user starts it from the topology page. Only then performance degradation will happen. isn't it?

@HeartSaVioR
Copy link
Contributor

@roshannaik
You may want to elaborate regarding performance degrade, especially how you come up with this patch.
Please feel free to post on dev mailing list, or add comment to JIRA issue, or this pull request.

@arunmahadevan
Copy link
Contributor

@roshannaik setting topology.eventlogger.executors: 0 completely disables event logging by default (i.e. no threads are started to handle event logging). If someone tries to turn on event logging in the topology, they wont see any events logged and this is a very confusing behavior in my opinion. It makes them think that this functionality is totally broken or something is wrong with their topology.

The event logging happens only when they enable it and we don't expect them to turn it on all the time. The actual overhead otherwise is to check if a few flags are turned on.

Can you post more details on the topology you ran and the results you observed when you set topology.eventlogger.executors to 0 vs setting it to nil ?

@abhishekagarwal87
Copy link
Contributor

I agree with @arunmahadevan. You can also show an alert message in UI, when user turns on the event logging for a topology e.g. "Event logging may degrade the performance" or something similar.

@roshannaik
Copy link
Contributor Author

@abhishekagarwal87
This perf hit of 7-9% exists even if user does not enable logging from the UI. With logging enabled, it should be even higher.

@arunmahadevan
The perf improvement i described here is observed when going from the default value nil to 0 for topology.eventlogger.executors

Topology detail: 1 spout, 1 bolt, 1 acker. Spout generates random numbers. Bolt does this math on the tuple: (value+1)*2. In effect this is just a speed of light topology.

Cause for perf hit: The perf hit that i noted is actually due to that very same checking of the flag.
This checking of flags in clojure turns into very expensive lookups in clojure internals. It internally invokes Java reflection! If you are thinking 'what the hell'.. then yes that was my reaction too.

Specifically this is the problematic lookup for this code path : 'storm-component->debug-atom'

I agree with @arunmahadevan 's concern that this will confuse the users when they don't see logs after enabling it on the UI.

The alternative fix for this is to change the manner in which this flag is made available to the code. Basically make it more efficient.

There are some other lookups in the critical path that are also are causing perf hits... which i plan to address in a separate jira.

@abhishekagarwal87
Copy link
Contributor

Thanks @roshannaik for detailed explanation. I wasn't expecting this. We should fix the perf hit in the check.

@arunmahadevan
Copy link
Contributor

@roshannaik Thanks for the details. I will try and run a similar topology to see the difference and see if there is something we can do. Do you have any suggestions on how we can improve the performance of the check in clojure ?

@arunmahadevan
Copy link
Contributor

@roshannaik profiled "org.apache.storm.starter.ExclamationTopology" after setting log level to ERROR and the send_to_eventlogger doesn't show up in the top 25 methods in terms of CPU time. Tried with different configurations (one worker, three workers, increased and reduced the spout and bolt parallelism) but the results were more or less the same.

profile-3-3

@roshannaik
Copy link
Contributor Author

@arunmahadevan I think your Visuam VM output is not presenting the data in a manner you are expecting. Here is the JProfiler output.:

before

@arunmahadevan
Copy link
Contributor

@roshannaik I ran the Exclamation topology with the TestWordSpout emitting a tuple every 10 ms with a spout parallelism of 10 and measured the throughput with and without event logger and I observed almost the same results.

I am also attaching the Jprofiler output after tracing the topology for about 1.36 Million tuples where the send_to_eventlogger is consuming only around 0.2% time.

spout

@roshannaik
Copy link
Contributor Author

@arunmahadevan
Your call tree seems a bit different than mine.
I dont see why you would set a 10ms wait.. your throughput computations will be naturally thrown off by that.
Also set spout parallelism = 1 and bolt =1, worker=1 when profiling.

@roshannaik
Copy link
Contributor Author

I retired the throughput measurements with ackers=0 .. impact is even greater ... its 25% faster when event.logger=0

@roshannaik
Copy link
Contributor Author

In my opinion, since this performance hit can be quite large, we should definitely set eventLoggers=0 by default, and additionally the UI should also disable the 'debug' button if eventLoggers=0
That will avoid the confusion to users that Arun was mentioning earlier... This feature has some needless penalties and wasting CPU cycles even when not in use.

I tried switching the clojure lookups into a java implementation .. but that didnt help. So i don't see a way to speed up these lookups.

@HeartSaVioR
Copy link
Contributor

@roshannaik
I feel this is similar to #1242. The difference is that #1242 is related to nextTuple call count, and this is related to emit tuple count.
Similar to #1242, I think how fast we turn on/off debug is not that critical so that we want to check for every emits. So we may want to apply similar workaround if that helps.

@arunmahadevan
Copy link
Contributor

Re-ran the exclamation topology with,

  • no anchoring/acking
  • TestWordSpout emitting a fixed word without any sleep
  • spout, bolt parallelism = 1
  • worker = 1
  • Loglevel = error, conf.setDebug(false)

Throughput I observed :-
566 K tuples/sec (with eventlogger: nil)
569 K tuples/sec (with eventlogger: 0) roughly 0.5% improvement.

I then ran Jprofiler and saw 1.3% time being spent in send_to_eventlogger.

The profiler output itself might be offset due to the instrumentation overhead. Jprofiler detects the following with send_to_eventlogger
some methods with excessive instrumentation overhead has been detected. They are called very frequently, their execution times are very short and the time required for measuring those calls are dispropotional

I retired the throughput measurements with ackers=0 .. impact is even greater ... its 25% faster when event.logger=0

Are you taking the measurements while the profiler is running or with the debug flag turned on? I don't see this happening otherwise. Are you using the latest Jprofiler (9.1.1) ? Are you using any extra plugins to instrument the hashmap lookups (since I see the hashmap keys in your screenshot) ? If so that itself might be skewing your results.

To avoid the persistentMap lookups, I also tried passing the storm-id and debug-atom values as args to send_to_eventlogger and the percentage reduced from 1.3 % to 1 % . You could try this change and see how it impacts your topology.

I agree with @HeartSaVioR and don't think we need to set eventlogger to 0.

@HeartSaVioR
Copy link
Contributor

@arunmahadevan @roshannaik
I guess I didn't express my thought well, modified last comment.
Anyway, I'm afraid micro optimization with micro benchmark could result in premature optimization easily (but awesome works STORM-1526 and STORM-1539!), and more important thing is that Storm still have design spots to improve performance as you know.

@roshannaik
Copy link
Contributor Author

@arunmahadevan :-) ... i am not taking the throughput measurements while profiler is attached.
It will take some time for me to continue to iterate over and analyze your attempts and JProfiler usage to see what is going on. With a quick glance I see multiple differences in your topology setup. But the profiler screenshots that i have posted are hopefully evidence that I didn't cook it up :-). You can either try with the topology i described .. also i shall post a Github link to the topology i am using soon.

@HeartSaVioR I am a bit puzzled to see a 8% or 25% diff in perf (for a given topology) being referred to as micro optimization. This is a case of potentially significant overhead being imposed upon the common code path by a infrequently used code path. Quite the contrary, i feel, one should have to have a very good justification to leave this turned on.

It is not feasible to do a full fledged Yahoo style benchmark to identify and fix all such issues. Micro-benchmarking is essential. Here we are looking at a simple case of emit() call dominating most of the time within nextTuple() ... the spout computation itself is taking negligible % of the time.

I have deliberately separated out #1242 from this .. as this is PR about simply disabling a DEBUG config setting.. as opposed to modifying code to avoid repetitive lookups. Seeking and testing an alternative implementation for event logging (unless its trivial) i felt might be tricky at this late stage of 1.x.

@harshach
Copy link
Contributor

@arunmahadevan @HeartSaVioR Looking at the perf analysis from @roshannaik It looks to be there is enough evidence to consider this as serious issue in performance. Given that eventlogging is new feature and we do have evidence its causing perf issue I am +1 on disabling it by default. I understand that once they disabled they can't enable it in a running topology and that is OK. For most usecases this might be used in dev cluster than a production cluster. Also this is a blocker for 1.0 release , lets get this merged in and see if there is a better a way to enable it by default and we can that in 1.1 release.

@ptgoetz
Copy link
Member

ptgoetz commented Mar 23, 2016

I did a quick benchmark on a real cluster (albeit on a VMware cluster) and found that there was a throughput hit, but it was small -- about 0.4%.

I'm okay with leaving the defaults as is, and documenting how to disable the feature.

If there's a better solution, I'm okay with waiting for a post 1.0 release.

@HeartSaVioR
Copy link
Contributor

@roshannaik
I'm sorry I feel I picked the wrong word micro optimization to confuse you what I mean. local optimization seems clearer.

Btw, I guess @ptgoetz got me.
Yes, I agree we need micro-benchmark to clear out variables, but I think it has to be re-evaluated with normal benchmark to reason how it affects in relatively normal situation, especially if it has to touch functionalities.
If it doesn't touch functionality I would say "Awesome work!" even though under 5% of performance gain on local optimization.
(Why STORM-1526 and STORM-1539 didn't need to re-evaluate with normal benchmark is that it didn't affect any functionalities.)

And I guess this overhead (0.006720889 ms = 6720.889 ns per each tuple spend in send_to_eventlogger as @arunmahadevan posted) is relatively very small than what Storm has to do for process tuple - enqueue and dequeue, finding task id to send, serde, transfer - which we may find spots to improve.

Anyway, I agree that's inside of critical path so we may want to find the alternative way with not touching functionality.
If you really want to disable it by default, it would be better to post mail to dev mailing list to build consensus first.

@ptgoetz
Copy link
Member

ptgoetz commented Mar 24, 2016

We're debating six versus one half dozen. Do we disable it by default and explicitly tell users they have to turn it on for the UI functionality to work? Or do we enable it by default and tell users to disable it per topology to realize a small performance gain?

I could go either way, but the latter seems like a better user experience for users new to the feature.

Also, the minor performance hit is eclipsed by the performance improvements in 1.0. And it can be easily turned off. It just needs to be documented clearly, IMO.

@roshannaik
Copy link
Contributor Author

Perhaps we can wait a bit before concluding on this. Some of you feel confident that there is a minor or no hit based on whatever topology/setup you have used. Whereas it is quite significant on the one i used.
My current topology is implemented to run within the storm-benchmark tool. I can rewrite that as a regular topology so you can run it easily to validate.. OR, if you are comfortable running it, i can share it as is more quickly.

@HeartSaVioR
Copy link
Contributor

OK I'm assuming this is valid performance hit whether it is small or huge.

For choosing 6 or half a dozen, I think whether turning on or off by default should be decided on that its use cases are valid for production.
Since STORM-954 is created by @harshach, I guess he considered use cases for this. If we predict most use cases are in dev., sure I agree we can just disable it by default, and document how to enable.

@arunmahadevan
Copy link
Contributor

@roshannaik did you try passing the map values are arguments and take the measurements? Based on your earlier results it appeared that PersistentMap lookup was causing the hit (I still think it could very well be due to the profiler overhead)

Here are the changes I made - arunmahadevan@7eae5ec . I would like see how it affects your profiling.

I don't think a 0.4 to 0.5 % increase in throughput should be a reason to completely disable a feature. And spouts that emit tuples in a tight loop would not be a very common use case whatsoever. I am for documenting this feature so that the users can adjust the config values based on their needs rather than turning it off.

@harshach
Copy link
Contributor

@arunmahadevan @ptgoetz we are not worried about 0.4 to 0.5% affect on throughput. For most cases no one going to notice that. Lets wait for @roshannaik topology and you can run it and see if its still 0.4% than we can ignore this.
@ptgoetz can you add details how did you run your benchmark.

@arunmahadevan
Copy link
Contributor

@harshach

@arunmahadevan @HeartSaVioR Looking at the perf analysis from @roshannaik It looks to be there is enough evidence to consider this as serious issue in performance.

Can you also take a look at the results that I observed where the throughput difference is negligible ? I am for disabling it if theres a consensus on the results and that it really affects performance.

@harshach
Copy link
Contributor

@arunmahadevan I saw the earlier comment. Is that topology ran with 10ms sleep in the spout?
@roshannaik do you also have some numbers like Arun posted , events per sec.

@arunmahadevan
Copy link
Contributor

@harshach I had removed the sleep in the latest run to match what @roshannaik was evaluating.

@roshannaik
Copy link
Contributor Author

@arunmahadevan , like i already said earlier, i did not take the throughput measurements with the profiler attached. it makes no sense to do so. profiling was done separately.. and it correlated with the throughput measurements.

@roshannaik
Copy link
Contributor Author

@harshach my numbers for this rewritten topology were ..
EL=1, Acker=0 : 641,202 acks/sec, 1,282,413 emits/sec
EL=0 , Acker=0 : 719,399 acks/sec, 1,438,798 emits/sec

Approx 12% improvement over EL=0

@roshannaik
Copy link
Contributor Author

@HeartSaVioR the topology is internally overriding some of those values like worker count. You will need to comment it out if you'd like to try varying them from cmd line.

@HeartSaVioR
Copy link
Contributor

@roshannaik
Command line options override existing configurations.
https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/utils/Utils.java#L452

895  [main] INFO  o.a.s.StormSubmitter - Submitting topology BasicTopology in distributed mode with conf {"topology.acker.executors":0,"storm.zookeeper.topology.auth.scheme":"digest","topology.workers":2,"topology.debug":false,"storm.zookeeper.topology.auth.payload":"-6295626850175862189:-8269401123970688630","topology.disruptor.batch.size":10000,"topology.eventlogger.executors":0,"topology.max.task.parallelism":1}
1118 [main] INFO  o.a.s.StormSubmitter - Finished submitting topology: BasicTopology

@HeartSaVioR
Copy link
Contributor

I guess @arunmahadevan removes not only instantiation of Long but also Random.randInt() which makes sense to me.

@roshannaik @arunmahadevan
Btw, please change your BasicTopology to here,

uptime: 30 duration: 30 secs
spout emitted: 11197080 (total: 11197080) emitted/sec (in duration): 373236.0
bolt emitted: 11195560 (total: 11195560) emitted/sec (in duration): 373185.3333333333
uptime: 60 duration: 30 secs
spout emitted: 21304500 (total: 32501580) emitted/sec (in duration): 710150.0
bolt emitted: 21304340 (total: 32499900) emitted/sec (in duration): 710144.6666666666
uptime: 90 duration: 30 secs
spout emitted: 24283780 (total: 56785360) emitted/sec (in duration): 809459.3333333334
bolt emitted: 24283800 (total: 56783700) emitted/sec (in duration): 809460.0
uptime: 120 duration: 30 secs
spout emitted: 24474380 (total: 81259740) emitted/sec (in duration): 815812.6666666666
bolt emitted: 24474320 (total: 81258020) emitted/sec (in duration): 815810.6666666666
uptime: 150 duration: 30 secs
spout emitted: 24125940 (total: 105385680) emitted/sec (in duration): 804198.0
bolt emitted: 24126000 (total: 105384020) emitted/sec (in duration): 804200.0
uptime: 180 duration: 30 secs
spout emitted: 23901220 (total: 129286900) emitted/sec (in duration): 796707.3333333334
bolt emitted: 23901500 (total: 129285520) emitted/sec (in duration): 796716.6666666666
...

I pasted functionality about printing metrics from cluster information periodically. (It came from FastWordCountTopology, and I modified some.)
Why this change is necessary is because we should get rid of numbers before warmup while calculating, and speed is getting increased and metric numbers are changed continuously so when exactly you refreshed UI can change your numbers. (Please note that 10m is changed, too)

You may want to increase your benchmark period long enough (by modifying loop count in main) to see when its speed becomes stabilized. You may also want to make period as argument.

@HeartSaVioR
Copy link
Contributor

@arunmahadevan
Please post pull request with your patch.
I forgot your patch and tried to fix myself by changing the way to update debug information via pushing system tuple to each executor. It increases throughput for EL=1, but it also decreases throughput for EL=0 compared to before.
I'll try to improve my approach if your patch doesn't help much, but according to your number it seems to end up this long conversation.

@roshannaik
Could you please apply @arunmahadevan patch and do benchmarks again? Thanks in advance!

@roshannaik
Copy link
Contributor Author

@HeartSaVioR i think there is only so much effort I can keep putting into providing credible evidence that there is an issue or hand holding other people. IMO there is enough of already shared here. I prefer not to try to convince people against their own will.

Like i mentioned before, @arunmahadevan 's theory is incorrect. On a side note, any spout that generates data from will do something similar.

@ptgoetz
Copy link
Member

ptgoetz commented Mar 28, 2016

It should be fairly easy to alter the UI to make it clear to users when this is disabled, as well as how to enable it. If we can do that I would support disabling it by default (i.e. I would be +1).

My main objection with this patch is that will make the UI appear broken by default and confuse users, especially new users.

@roshannaik
Copy link
Contributor Author

@ptgoetz , i fully support that line of thought.
I can take a stab at making that UI change and including that fix in this PR.

@HeartSaVioR
Copy link
Contributor

While I'd like to merge @arunmahadevan patch since it just improves performance with no change on functionality, I'm also with @ptgoetz's suggestion.
Since title of the issue is 'Disable event logging by default', I think we can address performance improvement to other issue and handle separately.

@roshannaik
Copy link
Contributor Author

@ptgoetz I have made the changes to the UI as discussed.
Took lot more effort than I imagined. Turns out that adding tooltips on disabled buttons (via Bootstrap that the UI uses for tooltips) requires advanced UI skills. Fortunately found a simpler workaround after much experimentation.

@ptgoetz
Copy link
Member

ptgoetz commented Mar 31, 2016

@roshannaik, thank you for your patience, persistence, and dedication throughout the lifecycle of this patch. I hate to vote -1 on anything. I'm really glad we're working toward a solution that everyone supports.

I have yet to review and test, but definitely plan to approve if all works out.

@HeartSaVioR
Copy link
Contributor

@roshannaik
Thanks for your quick reaction.
I applied your patch and verified that 1. topology.eventlogger.executors is 0 at default 2. tooltip is shown when 'Debug' button is disabled, and it describes how to enable event logger.

topology-eventlogger-tooltip-works

I'm also talking a look into #1272, but it's for improving performance when topology event logger > 0, not for blocking this.
Thanks again for your patience. +1 overall for me.

@arunmahadevan
Copy link
Contributor

1 . The tooltip takes some time to show up, whereas the tooltip on other elements like the title elements under Topology summary shows up immediately. I played around a bit and found a way to show the tooltip on disabled button immediately using the same style as other tooltips. You need to wrap the input element in a span like below,

<span style="display:inline-block;" data-toggle="tooltip" title="" data-original-title="To debug set topology.eventlogger.executors to a value > 0 or nil">
  <input disabled="" onclick="confirmAction('test-topology-1-1459402415', 'test-topology', 'debug/enable', true, 10, 'sampling percentage', 'debug')" type="button" value="Debug" class="btn btn-default">
</span>

2 . The tooltip is shown even if the button is enabled. You could add the <span> </span> only when number of executors is 0, which would take care of this.
3 . The tooltip is also shown when the debug button gets disabled after the user clicks on it (when topology.eventlogger.executors > 0). Similar fix as above.
4 . The button should be disabled in the spout/bolt pages as well.

@roshannaik
Copy link
Contributor Author

  1. I spent sometime trying to see if the tooltip can show up faster. But there is a catch-22 there. The Bootstrap based tooltip mechanism allows it, but the Bootstrap based tooltips wont show up if button is disabled. So had to settle for this mechanism... which has a downside that it doesnt allow customization of color and timing to the best i could check. The solution you describe here is among one of the alternative I had already tried and it doesn't work. The tooltip wont show up unless you hover the mouse over the boundary between the enclosing span element and the disabled button.

update: i notice that your solution differs slightly from what i tried in the below specification for the enclosing span element

style="display:inline-block;

On retrying, i see that works out better. thanks.. will make the fix.

2 & 3. Although nice to not show tooltip when enabled, I thought its OK to show the tooltip in both enabled/disabled states of the button. Given the desire for quick turn around, i felt it was wise to not pursue implementing/testing this nice to have, but non-critical feature. Shall give it a quick shot and make it part of this PR if t works out.

4 . Let me look into that.

…are disabled. Also enable/disable of Debug button along with toltip in component page
@roshannaik
Copy link
Contributor Author

Update: Have fixes for pts 1 2,3 & 4 raised by Arun.
To support the changes in the component level page would have required a change the thrift spec to add the additional info in the response object. However thanks to help for @harshach was able to find an workaround to avoid that.
Now the tooltips are shown if both these conditions are met: debug button is disabled and the event loggers are disabled.

@knusbaum
Copy link
Contributor

knusbaum commented Apr 1, 2016

+1

@ptgoetz
Copy link
Member

ptgoetz commented Apr 1, 2016

Thanks for the last-minute effort @roshannaik.

+1

I typically like to wait for 24 hrs. after the last commit on a PR to merge, even though it's not required by our bylaws.

I intend to merge this earlier in the interest of getting the 1.0 release out. If there are any objections after the fact, there will plenty of time to cancel the VOTE, revert, etc.

@knusbaum
Copy link
Contributor

knusbaum commented Apr 1, 2016

@ptgoetz +1 for quick merge

@arunmahadevan
Copy link
Contributor

@roshannaik thanks for fixing it in the right way. I have run some quick tests and patch it works as expected.

+1

@asfgit asfgit merged commit 7d7b3c2 into apache:1.x-branch Apr 1, 2016
@roshannaik
Copy link
Contributor Author

Thanks to each of you for lending your time and effort with this.

jsonData["rebalanceStatus"] = (status === "ACTIVE" || status === "INACTIVE" ) ? "enabled" : "disabled";
jsonData["killStatus"] = (status !== "KILLED") ? "enabled" : "disabled";
jsonData["startDebugStatus"] = (status === "ACTIVE" && !debug) ? "enabled" : "disabled";
jsonData["startDebugStatus"] = (status === "ACTIVE" && loggersTotal!=null && loggersTotal!=0 && !debug) ? "enabled" : "disabled";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it disable the debug option even when topology.eventlogger.executors is set to null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think we should remove loggersTotal != null so that "debug" is enabled if the value is set to null since one event logger task is created per worker if the value is set to null. This interpretation of "null" is not very intuitive, but its consistent with what "null" means with other variables like "topology.acker.executors".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the intent is to disable event logging by default, we should disable logging if topology.eventlogger.executors is null ... otherwise its confusing.

@roshannaik
Copy link
Contributor Author

On further thought... i feel it maybe ok to just remove the loggersTotal != null check here. Shall I just update this PR ?

@arunmahadevan
Copy link
Contributor

@roshannaik yes, it will make the behavior consistent with "ackers". You may also want to update the tooltip. Since this PR is already merged, you may have to raise another one and not sure if that would make it to 1.0 release.

@abhishekagarwal87
Copy link
Contributor

@roshannaik can you also disable the events link in component-summary-template if loggers is disabled. It is clickable and points to blank page. If you want, I can put up a fix for this.

@roshannaik
Copy link
Contributor Author

@abhishekagarwal87 @arunmahadevan
Since it wont make it into 1.0.. i shall make this change part of a separate jira & PR. That will avoid confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants