-
Notifications
You must be signed in to change notification settings - Fork 4.1k
STORM-433: Give users visibility to the depth of queues at each bolt #236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This touches the critical path. We can't add this until quantifying the performance cost of this.
|
-1 As this touches the critical path, this needs some performance testing to measure the before and after impact of this before we can even consider merging this in. |
|
I see your point. This probably doesn't belong on the critical path anyway, so I'll move it off. |
|
Yea if you can find a way to do that that's thread safe that's ideal. One way to do that is to place a special event on the disruptor queue, and when the read thread sees that special event it can write the stat out somewhere (this would be similar to how the INTERRUPT event works in DisruptorQueue.java) |
|
Alright, I've moved the updates into the heartbeat thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you split this line up? everything looks fine, but it is so long it is really hard to tell what is happening with this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd like to reformat these templates in a few places. There are quite a few spots like this in the HTML. I can fix this and file a separate jira for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK file a separate JIRA and then I am +1, but @nathanmarz was also reviewing this, so I am going to wait for a +1 from him before committing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request and jira are up #243
|
A few things:
|
|
@nathanmarz, @revans2, @d2r
|
Yes, I think this would be a tremendous help to users while debugging "slowness" issues. |
Conflicts: storm-core/src/clj/backtype/storm/daemon/executor.clj storm-core/src/clj/backtype/storm/ui/core.clj storm-core/src/ui/public/templates/component-page-template.html
Regenerating Thrift code. Ready for review.
|
This is ready for another look. |
|
Hi @knusbaum, it has been a while, and it looks like this needs an up-merge again. It might also be good to remove changes from the Thrift-generated code that result in only time stamp changes. |
|
@knusbaum : I see you closed this. Are you planning to open a new PR for publishing these stats? It would be a big help to us, we often wonder at various behaviors we witness from our users' topologies, and the only mechanism we have right now for getting visibility into the queue depths seems to be getting every topology owner to use a custom metrics consumer to publish the metrics which we would then need to provide fancy aggregation on top of. Having it in the Storm UI and also in the API stats would be very very helpful. |
|
I am preparing a patch for publishing in-backlog (receive-queue) and out-backlog (send-queue). I am able to see the average value of these metrics on UI over each time window period but only in executors section (No aggregation over component level etc) Also users may be interested in instant value of these metrics and I don't know how will I fit that into UI. any suggestions are welcome. |
|
@abhishekagarwal87 : awesome news! Any way you can post a screenshot of the UI you are currently proposing? At least please do so with the PR when you send it. That could help others to brainstorm how to put such values into the UI. Maybe you're instead asking for suggestions on how to handle obtaining the instantaneous values? |
|
I will do that Eric. I am using https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/metric/internal/LatencyStatAndMetric.java to store the windowed values. It is easy to add the instantaneous values in the result map so that is not a problem. I will put a screenshot and the PR soon. May be that will clear the confusion. |
|
@abhishekagarwal87 I am excited to see your screenshot as well. Also, I am curious as why you chose to use LatencyStatAndMetric.java over CountStatAndMetric.java? |
|
Because we need average length of queue average over each time window. If you look at the LatencyStatAndMetric, it does exactly that. I record the queue lengths in each operation and return the average values with executor heartbeat. |
|
Guys, follow up discussion is happening on #1406. Can you let me know your feedback there? |
This pull request adds a column to the executors table on the components page showing the average length of the tuple queue when the executor consumes a chunk.