-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metrics][Kubernetes] Update proxy, scheduler and controller manager fields and dashboards #4948
[Metrics][Kubernetes] Update proxy, scheduler and controller manager fields and dashboards #4948
Conversation
🌐 Coverage report
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests are failing with the error:
found 0 hits in metrics-kubernetes.state_persistentvolumeclaim-ep data stream: index_not_found_exception: no such index [metrics-kubernetes.state_persistentvolumeclaim-ep] Status=404
found 0 hits in metrics-kubernetes.state_persistentvolumeclaim-ep data stream: search_phase_execution_exception: all shards failed Status=503
doesn't seem to be related to your change, could you try to retrigger tests?
did you try to run tests locally?
} | ||
}, | ||
"kubernetes": { | ||
"proxy": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this sample representative? it does not contain much kubernetes specific fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I will update the sample for the 3 datasets on the next commit.
Yes, I can try that, but I don't think it will work. From what I understand, the field |
/test |
"decimals": 1 | ||
} | ||
}, | ||
"formula": "differences(last_value(kubernetes.proxy.process.memory.resident.bytes))", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking on https://user-images.githubusercontent.com/113898685/210374382-e14518ad-ac4c-4683-9abb-51a00aefeb3e.png I am not sure that this is a clear representation of memory usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no metric for the limit of resident memory usage, so having the average isn't really interesting. I used the rate because it will be easier to detect changes, just like with all the other metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so having the average isn't really interesting.
why?
I used the rate because it will be easier to detect changes, just like with all the other metrics
I wouldn't say that for monitoring purpose changes like increase/decrease in 50KB provide any value, but it gives a falls impression that smth is wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very hard to detect spikes on average if we are taking into account many values, while the rate only cares about the last value, making it very easy. But we also have to think about the point of our visualization: is it to get an overview of the overall performance or to highlight problems that might require immediate action?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very hard to detect spikes on average if we are taking into account many values
but are such spikes as on the screenshot - https://user-images.githubusercontent.com/113898685/210374382-e14518ad-ac4c-4683-9abb-51a00aefeb3e.png are important for either overall performance or to highlight problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spikes as the one on the screenshot are not important, but the node on the screenshot only has the essential pods + the metricbeat pod. But it is hard to make a case without giving examples. I will try to make it clear once I open the PR for elastic/observability-docs#2474. In the meantime, we can leave it as it is, or I can change to the average
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest average(kubernetes.proxy.process.memory.resident.bytes)
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will update to that. I am still a little skeptical on choosing average over rate, but it will maybe become more clear once we settle on which problems we want to highlight.
"type": "visualization", | ||
"version": "8.4.0-SNAPSHOT" | ||
"version": "8.5.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"version": "8.5.1" | |
"version": "8.6.0" |
packages/kubernetes/manifest.yml
Outdated
@@ -10,7 +10,7 @@ categories: | |||
- kubernetes | |||
release: ga | |||
conditions: | |||
kibana.version: "^8.5.0" | |||
kibana.version: "^8.7.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this also with 8.6.0 and works fine.
See comment on line https://github.com/elastic/integrations/pull/4948/files#diff-293e076906b24a178d0e026ab430e5289dabc6f425f2113bcbcf6125d68c2524R170 which I guess should be done 8.6.0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only works because some fields existed already on the previous version of metricbeat. But that doesn't happen for all fields
packages/kubernetes/kibana/dashboard/kubernetes-f5ab5510-9c94-11e9-94fd-c91206cd5249.json
Outdated
Show resolved
Hide resolved
packages/kubernetes/kibana/dashboard/kubernetes-bf9389f0-0c14-11ed-b760-5d1bccb47f56.json
Outdated
Show resolved
Hide resolved
packages/kubernetes/kibana/dashboard/kubernetes-5e649d60-9901-11e9-ba57-b7ab4e2d4b58.json
Outdated
Show resolved
Hide resolved
packages/kubernetes/kibana/dashboard/kubernetes-5e649d60-9901-11e9-ba57-b7ab4e2d4b58.json
Outdated
Show resolved
Hide resolved
packages/kubernetes/kibana/dashboard/kubernetes-5e649d60-9901-11e9-ba57-b7ab4e2d4b58.json
Outdated
Show resolved
Hide resolved
They have values @gizas on version 8.7.0. Partial intervals were dropped, so you have to wait a few seconds to see them |
@constanca-m great work overall and very detailed !!! You can update the sections of And also make sure that in your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see 8.7 in the manifest.yml
.
Changing dashboards and visualizations using an unreleased version of Kibana is unsafe since the App Ex team is still actively making changes to the Kibana code and potentially the data models. In other words, you have no guarantee that your changes won't be broken by the time 8.7 is released.
If this PR can wait until 8.7 is about to be released, that's best. If there's some reason it really has to be merged now, please reach out.
@andrewctate
main reason of using 8.7 here, is because integration package must be compatible with beats version - elastic/beats#34161, where some new fields were introduced, that we rely on in this PR. What then would be the best approach in this case? |
I tried to backport the changes on metricbeat to 8.6 since the metrics as they are now are not correct. I am not sure if it worked, but if it did, I am planning on changing the condition to 8.6 instead of 8.7 @tetianakravchenko |
}, | ||
"query": { | ||
"match_phrase": { | ||
"datastream.dataset": "kubernetes.proxy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be data_stream.dataset
? 🤔
comparing to https://github.com/elastic/integrations/pull/4972/files
the same for other dashboards in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks for pointing that out. I am going to test it now that the changes were backported on metricbeat and I will make the commit for version 8.6 accordingly
I understand. Your reasons for using 8.7 are valid. If @constanca-m is successful in making the changes to the dashboards on 8.6 instead, that is the most straightforward solution. If not, my suggestion is to wait on merging this PR until at least the day after 8.7 feature freeze (or even a few days before public release). That gives you a chance to verify that your dashboard changes still work before committing them to the main branch. Let me know if there's some reason this strategy won't work for you. |
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
…s' into fix-kubernetes-deprecated-metrics
Package kubernetes - 1.31.1 containing this change is available at https://epr.elastic.co/search?package=kubernetes |
What does this PR do?
This PR updates the dashboards and fields of proxy, scheduler and controller manager to match the ones mentioned in elastic/beats#34161.
Checklist
changelog.yml
file.How to test this PR locally
elastic-package build
inside Kubernetes package.elastic-package stack up
.Related issues
Screenshots
The dashboards are just like the ones on the screenshots of elastic/beats#34161.
The cluster overview dashboard was also updated to include a link to the proxy dashboard:

Problem
When using

metricbeat-*
index, fetching a metric likenode_collector_zone_size{zone=""} 1
would result in a missing value onkubernetes.controllermanager.zone
. The result would appear on the visualization like this:However, when using the

metrics-*
index, there is no missing value anymore as the metric does not even exist:An empty value like the label
zone
label would be ignored according to metricbeat code. However, the end results inmetricbeat-*
andmetric-*
indexes are different.