-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSQ: Add CPU and thread usage counters. #16914
Conversation
The main change adds "cpu" and "wall" counters. The "cpu" counter measures CPU time (using JvmUtils.getCurrentThreadCpuTime) taken up by processors in processing threads. The "wall" counter measures the amount of wall time taken up by processors in those same processing threads. Both counters are broken down by type of processor. This patch also includes changes to support adding new counters. Due to an oversight in the original design, older deserializers are not forwards-compatible; they throw errors when encountering an unknown counter type. To manage this, the following changes are made: 1) The defaultImpl NilQueryCounterSnapshot is added to QueryCounterSnapshot's deserialization configuration. This means that any unrecognized counter types will be read as "nil" by deserializers. Going forward, once all servers are on the latest code, this is enough to enable easily adding new counters. 2) A new context parameter "includeAllCounters" is added, which defaults to "false". When this parameter is set "false", only legacy counters are included. When set to "true", all counters are included. This is currently undocumented. In a future version, we should set the default to "true", and at that time, include a release note that people updating from versions prior to Druid 31 should set this to "false" until their upgrade is complete.
@kgyrtkirk was evaluating the performance implications of measuring the |
|
@gianm WDYT about leaving the parameter undocumented, and being set by the broker. This takes advantage of the fact that brokers are upgraded last and if set by the broker, Druid can assume that all the workers are on a newer version. This will remove the user intervention. |
Thanks for confirming. I think if it is amortized, it shouldn't be an issue. The flame graph in the attached ticket feels misleading too, since turning off that code path doesn't lead to as much perf increase as anticipated. |
Unfortunately that wouldn't work well with async queries. If one Broker is updated and another isn't, and the user tries to retrieve information from the task report via async query APIs on the non-updated Broker, that non-updated Broker will error out when trying to read the unrecognized counters.
It could be some bias in the profiling. They do have that sometimes. |
Does that mean we're good to merge this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't done with the review when I left the previous comments. Just finished up the review and it seems good to me. Apologies for the delay
all good- thanks! |
Logical merge conflict between apache#16911 and apache#16914.
* MSQ: Add CPU and thread usage counters. The main change adds "cpu" and "wall" counters. The "cpu" counter measures CPU time (using JvmUtils.getCurrentThreadCpuTime) taken up by processors in processing threads. The "wall" counter measures the amount of wall time taken up by processors in those same processing threads. Both counters are broken down by type of processor. This patch also includes changes to support adding new counters. Due to an oversight in the original design, older deserializers are not forwards-compatible; they throw errors when encountering an unknown counter type. To manage this, the following changes are made: 1) The defaultImpl NilQueryCounterSnapshot is added to QueryCounterSnapshot's deserialization configuration. This means that any unrecognized counter types will be read as "nil" by deserializers. Going forward, once all servers are on the latest code, this is enough to enable easily adding new counters. 2) A new context parameter "includeAllCounters" is added, which defaults to "false". When this parameter is set "false", only legacy counters are included. When set to "true", all counters are included. This is currently undocumented. In a future version, we should set the default to "true", and at that time, include a release note that people updating from versions prior to Druid 31 should set this to "false" until their upgrade is complete. * Style, coverage. * Fix.
Logical merge conflict between apache#16911 and apache#16914.
The main change adds "cpu" and "wall" counters. The "cpu" counter measures CPU time (using
JvmUtils.getCurrentThreadCpuTime
) taken up by processors in processing threads. The "wall" counter measures the amount of wall time taken up by processors in those same processing threads. Both counters are broken down by type of processor.This patch also includes changes to support adding new counters. Due to an oversight in the original design, older deserializers are not forwards-compatible; they throw errors when encountering an unknown counter type. To manage this, the following changes are made:
The defaultImpl
NilQueryCounterSnapshot
is added toQueryCounterSnapshot
's deserialization configuration. This means that any unrecognized counter types will be read asnil
by deserializers. Going forward, once all servers are on the latest code, this is enough to enable easily adding new counters.A new context parameter
includeAllCounters
is added, which defaults to "false". When this parameter is setfalse
, only legacy counters are included. When set totrue
, all counters are included. This is currently undocumented. In a future version, we should set the default totrue
, and at that time, include a release note that people updating from versions prior to Druid 31 should set this tofalse
until their upgrade is complete.