Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics][Kubernetes] Update proxy, scheduler and controller manager fields and dashboards #4948

Merged
merged 4 commits into from
Jan 30, 2023
Merged

[Metrics][Kubernetes] Update proxy, scheduler and controller manager fields and dashboards #4948

merged 4 commits into from
Jan 30, 2023

Conversation

constanca-m
Copy link
Contributor

@constanca-m constanca-m commented Jan 9, 2023

What does this PR do?

This PR updates the dashboards and fields of proxy, scheduler and controller manager to match the ones mentioned in elastic/beats#34161.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

How to test this PR locally

  1. Download the changes of this PR.
  2. Run command elastic-package build inside Kubernetes package.
  3. Run elastic-package stack up.
  4. Using Kibana, add Kubernetes integration.

Related issues

Screenshots

The dashboards are just like the ones on the screenshots of elastic/beats#34161.

The cluster overview dashboard was also updated to include a link to the proxy dashboard:
image

Problem

When using metricbeat-* index, fetching a metric like node_collector_zone_size{zone=""} 1 would result in a missing value on kubernetes.controllermanager.zone. The result would appear on the visualization like this:
image

However, when using the metrics-* index, there is no missing value anymore as the metric does not even exist:
Screenshot from 2023-01-09 11-37-28

An empty value like the label zone label would be ignored according to metricbeat code. However, the end results in metricbeat-* and metric-* indexes are different.

@constanca-m constanca-m requested review from a team as code owners January 9, 2023 11:49
@constanca-m constanca-m self-assigned this Jan 9, 2023
@constanca-m constanca-m closed this Jan 9, 2023
@constanca-m constanca-m reopened this Jan 9, 2023
@elastic elastic deleted a comment from elasticmachine Jan 9, 2023
@elastic elastic deleted a comment from elasticmachine Jan 9, 2023
@constanca-m constanca-m closed this Jan 9, 2023
@constanca-m constanca-m reopened this Jan 9, 2023
@elastic elastic deleted a comment from elasticmachine Jan 9, 2023
@elastic elastic deleted a comment from elasticmachine Jan 9, 2023
@elastic elastic deleted a comment from elasticmachine Jan 9, 2023
@elastic elastic deleted a comment from elasticmachine Jan 9, 2023
@elasticmachine
Copy link

elasticmachine commented Jan 9, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-01-27T09:11:58.008+0000

  • Duration: 28 min 55 sec

Test stats 🧪

Test Results
Failed 0
Passed 91
Skipped 0
Total 91

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Jan 9, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (0/0) 💚
Files 100.0% (0/0) 💚
Classes 100.0% (0/0) 💚
Methods 94.872% (74/78)
Lines 100.0% (0/0) 💚
Conditionals 100.0% (0/0) 💚

@elastic elastic deleted a comment from elasticmachine Jan 9, 2023
@elastic elastic deleted a comment from elasticmachine Jan 9, 2023
@elastic elastic deleted a comment from elasticmachine Jan 9, 2023
Copy link
Contributor

@tetianakravchenko tetianakravchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are failing with the error:

found 0 hits in metrics-kubernetes.state_persistentvolumeclaim-ep data stream: index_not_found_exception: no such index [metrics-kubernetes.state_persistentvolumeclaim-ep] Status=404

found 0 hits in metrics-kubernetes.state_persistentvolumeclaim-ep data stream: search_phase_execution_exception: all shards failed Status=503

doesn't seem to be related to your change, could you try to retrigger tests?
did you try to run tests locally?

}
},
"kubernetes": {
"proxy": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this sample representative? it does not contain much kubernetes specific fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I will update the sample for the 3 datasets on the next commit.

@constanca-m
Copy link
Contributor Author

doesn't seem to be related to your change, could you try to retrigger tests? did you try to run tests locally?

Yes, I can try that, but I don't think it will work. From what I understand, the field kubernetes.namespace_uid is undefinied for those datasets like the error says. It is true, it is not present on the base-fields.yml of those. The other datatsets have that. maybe that is why there is no complaint.

@constanca-m
Copy link
Contributor Author

/test

"decimals": 1
}
},
"formula": "differences(last_value(kubernetes.proxy.process.memory.resident.bytes))",
Copy link
Contributor

@tetianakravchenko tetianakravchenko Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking on https://user-images.githubusercontent.com/113898685/210374382-e14518ad-ac4c-4683-9abb-51a00aefeb3e.png I am not sure that this is a clear representation of memory usage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no metric for the limit of resident memory usage, so having the average isn't really interesting. I used the rate because it will be easier to detect changes, just like with all the other metrics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so having the average isn't really interesting.

why?

I used the rate because it will be easier to detect changes, just like with all the other metrics

I wouldn't say that for monitoring purpose changes like increase/decrease in 50KB provide any value, but it gives a falls impression that smth is wrong

Copy link
Contributor Author

@constanca-m constanca-m Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very hard to detect spikes on average if we are taking into account many values, while the rate only cares about the last value, making it very easy. But we also have to think about the point of our visualization: is it to get an overview of the overall performance or to highlight problems that might require immediate action?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very hard to detect spikes on average if we are taking into account many values

but are such spikes as on the screenshot - https://user-images.githubusercontent.com/113898685/210374382-e14518ad-ac4c-4683-9abb-51a00aefeb3e.png are important for either overall performance or to highlight problems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spikes as the one on the screenshot are not important, but the node on the screenshot only has the essential pods + the metricbeat pod. But it is hard to make a case without giving examples. I will try to make it clear once I open the PR for elastic/observability-docs#2474. In the meantime, we can leave it as it is, or I can change to the average

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest average(kubernetes.proxy.process.memory.resident.bytes)

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will update to that. I am still a little skeptical on choosing average over rate, but it will maybe become more clear once we settle on which problems we want to highlight.

"type": "visualization",
"version": "8.4.0-SNAPSHOT"
"version": "8.5.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"version": "8.5.1"
"version": "8.6.0"

@@ -10,7 +10,7 @@ categories:
- kubernetes
release: ga
conditions:
kibana.version: "^8.5.0"
kibana.version: "^8.7.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this also with 8.6.0 and works fine.

(with 8.6.0)
Screenshot 2023-01-11 at 4 17 05 PM

See comment on line https://github.com/elastic/integrations/pull/4948/files#diff-293e076906b24a178d0e026ab430e5289dabc6f425f2113bcbcf6125d68c2524R170 which I guess should be done 8.6.0 ?

Copy link
Contributor Author

@constanca-m constanca-m Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only works because some fields existed already on the previous version of metricbeat. But that doesn't happen for all fields

@gizas
Copy link
Contributor

gizas commented Jan 11, 2023

Also for scheduler I see top visualisations have no data by default.
Do you think we should update the notes for this?

Screenshot 2023-01-11 at 5 20 35 PM

@constanca-m
Copy link
Contributor Author

Also for scheduler I see top visualisations have no data by default. Do you think we should update the notes for this?

Screenshot 2023-01-11 at 5 20 35 PM

They have values @gizas on version 8.7.0. Partial intervals were dropped, so you have to wait a few seconds to see them

@gizas
Copy link
Contributor

gizas commented Jan 11, 2023

@constanca-m great work overall and very detailed !!!

You can update the sections of related stories with the one elastic/observability-docs#2474 you created (in order someone lets say that will see your PR after sometime to see continuation of work and to have the reference)

And also make sure that in your How to test section: You need to enable Proxy,Scheduler, Controller manager datasets in kubernetes policy.

Copy link
Contributor

@drewdaemon drewdaemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 8.7 in the manifest.yml.

Changing dashboards and visualizations using an unreleased version of Kibana is unsafe since the App Ex team is still actively making changes to the Kibana code and potentially the data models. In other words, you have no guarantee that your changes won't be broken by the time 8.7 is released.

If this PR can wait until 8.7 is about to be released, that's best. If there's some reason it really has to be merged now, please reach out.

@elastic elastic deleted a comment from mergify bot Jan 12, 2023
@elastic elastic deleted a comment from mergify bot Jan 12, 2023
@tetianakravchenko
Copy link
Contributor

@andrewctate

Changing dashboards and visualizations using an unreleased version of Kibana is unsafe since the App Ex team is still actively making changes to the Kibana code and potentially the data models. In other words, you have no guarantee that your changes won't be broken by the time 8.7 is released.

main reason of using 8.7 here, is because integration package must be compatible with beats version - elastic/beats#34161, where some new fields were introduced, that we rely on in this PR. What then would be the best approach in this case?

@constanca-m
Copy link
Contributor Author

main reason of using 8.7 here, is because integration package must be compatible with beats version - elastic/beats#34161, where some new fields were introduced, that we rely on in this PR. What then would be the best approach in this case?

I tried to backport the changes on metricbeat to 8.6 since the metrics as they are now are not correct. I am not sure if it worked, but if it did, I am planning on changing the condition to 8.6 instead of 8.7 @tetianakravchenko

},
"query": {
"match_phrase": {
"datastream.dataset": "kubernetes.proxy"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be data_stream.dataset ? 🤔

comparing to https://github.com/elastic/integrations/pull/4972/files

the same for other dashboards in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for pointing that out. I am going to test it now that the changes were backported on metricbeat and I will make the commit for version 8.6 accordingly

@drewdaemon
Copy link
Contributor

drewdaemon commented Jan 12, 2023

@tetianakravchenko

main reason of using 8.7 here, is because integration package must be compatible with beats version - elastic/beats#34161, where some new fields were introduced, that we rely on in this PR. What then would be the best approach in this case?

I understand. Your reasons for using 8.7 are valid.

If @constanca-m is successful in making the changes to the dashboards on 8.6 instead, that is the most straightforward solution. If not, my suggestion is to wait on merging this PR until at least the day after 8.7 feature freeze (or even a few days before public release). That gives you a chance to verify that your dashboard changes still work before committing them to the main branch. Let me know if there's some reason this strategy won't work for you.

Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@constanca-m constanca-m reopened this Jan 24, 2023
constanca-m and others added 3 commits January 24, 2023 10:02
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@constanca-m constanca-m dismissed drewdaemon’s stale review January 30, 2023 07:13

Change requested is fixed.

@constanca-m constanca-m merged commit 3d42ed0 into elastic:main Jan 30, 2023
@constanca-m constanca-m deleted the fix-kubernetes-deprecated-metrics branch January 30, 2023 07:14
@elasticmachine
Copy link

Package kubernetes - 1.31.1 containing this change is available at https://epr.elastic.co/search?package=kubernetes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Kubernetes][Dashboards] Proxy - Fixing empty visualisations
6 participants