-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track resource consumption for query and fetch phases #1575
Conversation
Can one of the admins verify this patch? |
✅ Gradle Wrapper Validation success ba2cbb300f45f50ece128fdfc76f7253b82b02be |
❌ Gradle Check failure ba2cbb300f45f50ece128fdfc76f7253b82b02be |
❌ Gradle Precommit failure ba2cbb300f45f50ece128fdfc76f7253b82b02be |
✅ Gradle Wrapper Validation success 2c03ff6469825633b79ac7b68e2f5120eb6c66f8 |
❌ Gradle Precommit failure 2c03ff6469825633b79ac7b68e2f5120eb6c66f8 |
❌ Gradle Check failure 2c03ff6469825633b79ac7b68e2f5120eb6c66f8 |
✅ Gradle Wrapper Validation success 5091c1f446e6f7593f95c27cdbaaabc8410640b6 |
❌ Gradle Precommit failure 5091c1f446e6f7593f95c27cdbaaabc8410640b6 |
❌ Gradle Check failure 5091c1f446e6f7593f95c27cdbaaabc8410640b6 |
Part 1: This is first phase towards providing visibility into the most memory/compute heavy queries This change measures memory and cpu-time used during query and fetch phases. This is leveraging the single threaded execution model to track resources consumed by the thread executing the query/fetch phase. Signed-off-by: Ankit Malpani <ankit.malpani@gmail.com>
5091c1f
to
e0b2678
Compare
✅ Gradle Wrapper Validation success e0b2678 |
✅ Gradle Precommit success e0b2678 |
server/src/main/java/org/opensearch/common/metrics/ResourceTracker.java
Outdated
Show resolved
Hide resolved
*/ | ||
public void reset() { | ||
this.startingCPUTime = threadMXBean.getCurrentThreadCpuTime(); | ||
this.startingAllocatedBytes = threadMXBean.getThreadAllocatedBytes(Thread.currentThread().getId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be safer to consult if thread memory allocation measurement is supported (otherwise UnsupportedOperationException
) and enabled [1]:
if (threadMXBean.isThreadAllocatedMemorySupported() && threadMXBean.isThreadAllocatedMemoryEnabled {
this.startingAllocatedBytes = threadMXBean.getThreadAllocatedBytes(Thread.currentThread().getId());
}
@malpani I believe there are some overlaps with #1555 (please correct me if I am wrong). It seems like you are focusing on search only, but by and large, the query and fetch phases are just variations of |
@reta - yes they are related, @sruti1312 and I work on the same team and are collaborating to roll resource tracking out. To keep diffs small, we have split this out into multiple atomic changes instead of raising a giant merged PR that will be hard to review -
Coming back to your question - can this be purely done from task side - the answer is no. eg. Bulk tasks work very differently from search, so it will be upto different task executors to define how to track their resource usage. Given indexing usage is very easy to determine as a function of the incoming payload as against search where most of the problems arise and a small payload can cause extensive usage), the focus is currently on query resource observability. Hope this helps clarify where we are headed with this. |
Thanks @malpani, it makes sense. Just to highlight some gaps: it may not be possible to capture memory and CPU consumption accurately since additional executors may be involved along the way. Fe, we are exploring the experimental Apache Lucene support of the concurrent segments search (#1500) and it uses different executor behind the scene to do that. The resources consumed by those pooled threads won't be captured during query / fetch phases. Hope it sounds reasonable, thank you. |
@reta - thanks for the heads up, i skimmed through the PR and when concurrent segments search is enabled, the current form for tracking resource consumption of local thread will not work as multiple threads will be involved. My initial thought to have this work in that world will be a wrapped |
I'm happy to merge if @reta is A-OK with the change, LMK? |
@dblock yes, I am A-OK, thanks for asking! |
@zelinh can you please help with the whitesource failure in this PR? |
*/ | ||
@SuppressForbidden(reason = "ThreadMXBean enables tracking resource consumption by a thread. " | ||
+ "It is platform dependent and i am not aware of an alternate mechanism to extract this info") | ||
public class ResourceTracker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of having ResourceTracker
in SearchContext
, we can move it as interface to Task class itself. This interface can then expose the methods like:
setup/initialize
where it can initialize the current values from MXBean.recordStats()
where it can record the diff
Then once setTask
is called on Context object, it can initialize the collector/tracker. In updateResourceTracking
it can call the context.getResourceTracker().recordStats()
. This will avoid the need for reset
as each task will be updating its own internal stats. Context object will be set with new task for query and fetch phase separately.
Also how about renaming ResourceTracker
as StatsCollector
or MetricsCollector
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving
ResourceTracker
toTask
- The reason I kept it in
SearchContext
is,ResourceTracker
with its thread level stats is unique to tracking resource consumption for queries only. Currently, I am not seeing other consumers. We can move it upwards if there are otherTask
that would need similar logic - Also, if the concurrent segment search becomes the default in future, we might need multiple
ResourceTracker
per search task - I do see your point to avoid/optimize
reset
- and need to see the scenarios where the context could be reused. Either way will check if memory/cpu is non zero as a condition on pre-empting reset - what are your thoughts?
Also how about renaming ResourceTracker as StatsCollector or MetricsCollector ?
Stats
has a special meaning asstats
it is a request param that allows grouping stats on a tagCollector
has a special meaning too - Lucene collector- What are your thoughts on
MetricsTracker
- tracking metrics - resource consumption, other future metricsResourceTracker
- tracking resource consumption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving
ResourceTracker
toTask
- The reason for proposing
ResourceTracker
to move it toTask
is, these metrics/stats belongs totask
. Putting it within context is creating the need forreset
since context is used by multiple tasks. ConsideringTask
will be tracking metrics then having an interface which provides the API toupdate/get
stats related information in Task will fit well. I see that not all tasks is currently supporting the metrics collection, however that can be achieved by implementing theupdate/get
only by specific search related tasks. The base task can be a no-op for these calls. - For concurrent segment search, there still will be single task and context for search but segment level operation will be done in parallel by multiple threads. So each of these threads will now need to update the stats for same task.
Also how about renaming ResourceTracker as StatsCollector or MetricsCollector ?
I don't think stats/collector
definition in the example you shared makes it reserved word for only those modules. It is a general term and can be used in multiple places, for example: SearchExecutionStatsCollector
. But I am fine with MetricsTracker
as well since that will cover other categories like latencies too and not just resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have pushed a new revision which incorporates this feedback:
- Got rid of the
reset
calls asSearchContext
is not reused anymore - Renamed to MetricsTracker
Referring to #1593. We may ignore the WhiteSource check for OpenSearch repo for now. There is an issue on WhiteSource side and they are working to fix it. |
Signed-off-by: Ankit Malpani <ankit.malpani@gmail.com>
✅ Gradle Wrapper Validation success 03146e8 |
✅ Gradle Precommit success 03146e8 |
static ThreadMXBean threadMXBean = (ThreadMXBean) ManagementFactory.getThreadMXBean(); | ||
long startingAllocatedBytes; | ||
long startingCPUTime; | ||
long memoryAllocated; | ||
long cpuTime; | ||
|
||
/** | ||
* Takes current snapshot of resource usage by thread since the creation of this object | ||
*/ | ||
public void updateMetrics() { | ||
this.memoryAllocated = threadMXBean.getThreadAllocatedBytes(Thread.currentThread().getId()) - startingAllocatedBytes; | ||
this.cpuTime = threadMXBean.getCurrentThreadCpuTime() - startingCPUTime; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most optimal way to achieve it is below. Same is being used in a similar PR #1643
+ static {
+ threadMXBean = ManagementFactory.getThreadMXBean();
+ Method getBytes;
+ try {
+ getBytes = threadMXBean.getClass()
+ .getMethod("getThreadAllocatedBytes", long[].class);
+ getBytes.setAccessible(true);
+ failMessages = Collections.emptyList();
+ } catch (NoSuchMethodException e) {
+ getBytes = null;
+ }
+ getThreadAllocatedBytes = getBytes;
+ }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for safely handling JVMs that may not have support for tracking or are there other reasons to switch to reflection - as currently this is under a cluster setting that is disabled by default i was originally planning to add and test on different platforms at a later point but can add it right away.
For safety purposes - I was wondering of leveraging isThreadAllocatedMemorySupported and isCurrentThreadCpuTimeSupported instead. Any other reasons why the reflection route could be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to confirm if those checks are good enough. See https://bugs.openjdk.java.net/browse/JDK-8152859
@malpani should I merge it as is or are you making changes suggested by @Bukhtawar? |
start gradle check |
long startingAllocatedBytes; | ||
long startingCPUTime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these values are never assigned. Did you intend to initialize these to the initial measurements upon instance creation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for catching this. The constructor got removed after the recent refactor/reset removal! fixing it.
@dblock i will have one more revision to incorporate feedback from @andrross and @Bukhtawar . Also, we now have 2 similar PRs for resource tracking - one is for #1042 and the current one is for query observability. There is similar logic in both in terms of relying on resource usage deltas via |
closing this out in favor of a merged approach with #1042 |
Description
Part 1: This is first phase towards providing visibility into the most memory/compute heavy queries
This change measures memory and cpu-time used during query and fetch phases. I am leveraging the single threaded execution model to track resources consumed by the thread executing the query/fetch phase.
Signed-off-by: Ankit Malpani ankit.malpani@gmail.com
Issues Resolved
Search memory tracking
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.