Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add node status conditions metrics #155

Merged
merged 8 commits into from
Jul 20, 2021
Merged

Conversation

invidian
Copy link
Contributor

@invidian invidian commented Jul 13, 2021

Includes commits from #148. Proper diff to look at is here: invidian/improvements...invidian/add-node-status-metrics

Draft as the following things are missing:

  • Support for unknown value for conditions
  • Tests

@invidian invidian force-pushed the invidian/add-node-status-metrics branch from 3101680 to f9db301 Compare July 14, 2021 06:41
@invidian invidian changed the base branch from main to invidian/improvements July 14, 2021 12:47
src/prometheus/definition.go Outdated Show resolved Hide resolved
src/prometheus/definition.go Outdated Show resolved Hide resolved
src/prometheus/definition.go Show resolved Hide resolved
src/prometheus/definition.go Outdated Show resolved Hide resolved
src/prometheus/definition.go Outdated Show resolved Hide resolved
src/prometheus/definition.go Show resolved Hide resolved
@roobre
Copy link
Contributor

roobre commented Jul 14, 2021

I would also update the JsonSchema files for e2e test to capture this, with the basic kubelet conditions.

@invidian invidian force-pushed the invidian/improvements branch 2 times, most recently from 95269ca to 21d1242 Compare July 14, 2021 17:35
@invidian invidian force-pushed the invidian/add-node-status-metrics branch 3 times, most recently from 48f0d5b to 0d6b3d6 Compare July 14, 2021 20:20
@invidian
Copy link
Contributor Author

I would also update the JsonSchema files for e2e test to capture this, with the basic kubelet conditions.

Done, I think 😄

@invidian invidian marked this pull request as ready for review July 14, 2021 20:20
@invidian invidian force-pushed the invidian/add-node-status-metrics branch 2 times, most recently from f8c1a3a to b73df28 Compare July 14, 2021 22:11
@invidian invidian force-pushed the invidian/add-node-status-metrics branch from b73df28 to f5a5d0a Compare July 16, 2021 06:21
@invidian invidian force-pushed the invidian/improvements branch 2 times, most recently from 124dabe to 6885e7a Compare July 16, 2021 16:04
Base automatically changed from invidian/improvements to main July 16, 2021 16:17
@invidian invidian force-pushed the invidian/add-node-status-metrics branch from 344de7d to 6454a43 Compare July 16, 2021 19:01
src/metric/definition.go Show resolved Hide resolved
src/prometheus/definition.go Show resolved Hide resolved
src/metric/definition.go Outdated Show resolved Hide resolved
src/metric/definition.go Show resolved Hide resolved
src/metric/definition.go Outdated Show resolved Hide resolved
src/prometheus/definition.go Show resolved Hide resolved
@roobre
Copy link
Contributor

roobre commented Jul 19, 2021

E2E tests seem great to me, thanks!

Right now, if namespace metrics are processed, we still iterate over
pod-specific metrics, which will have "namespace" label. Those metrics
will be added to the namespace entity collection and will be overwritten
by next incoming metric of the same name.

To avoid putting unrelated data and overwritting it, this commit adds
conditions, which only puts relevant metrics for each raw group.

This means, "node" group won't have Pod-specific metrics and "namespace"
group won't have "pod", "daemonset" and others "more-specific" metrics.

This saves some computation time and makes it easier to debug if one
dumps entire raw groups struct.

Signed-off-by: Mateusz Gozdek <mgozdek@microsoft.com>
@invidian invidian force-pushed the invidian/add-node-status-metrics branch 2 times, most recently from ec17e3d to f49fbf1 Compare July 19, 2021 18:24
@invidian invidian requested a review from roobre July 19, 2021 21:29
@roobre
Copy link
Contributor

roobre commented Jul 20, 2021

Looking great, thanks @invidian!

@invidian invidian force-pushed the invidian/add-node-status-metrics branch from f49fbf1 to fddced0 Compare July 20, 2021 08:07
Right now, when processed metric family has more than one metric, it
will overwrite the content of the previous metric.

This commit changes this behavior and now if there is more then one
metric in a single metric family, value of the metric will be converted
to collection and appended, making all the metrics available for further
processing.

Right now, only user of this code is KSM metrics and no metrics are
duplicated with recently introduced filtering, however incoming node
state conditions metrics (#138) will require multiple metrics coming
from GroupMetricsBySpec for processing.

Signed-off-by: Mateusz Gozdek <mgozdek@microsoft.com>
This commit adds a
WithLabelValueInMetricNameAndWithLabelValueMappedToMetricValue function,
which allows to generate a set of metrics with label value in the metric
name. This will be used to generate node status conditions metrics (#138),
but may potentially be used also to generate HPA condition metrics.

Signed-off-by: Mateusz Gozdek <mgozdek@microsoft.com>
This will be required to test node status condition metrics.

Refs #138

Signed-off-by: Mateusz Gozdek <mgozdek@microsoft.com>
Right now integration creates multiple MetricSets from each job, which
makes it harder in e2e tests to validate those metrics against the
schema.

This commit improves that by merging multiple MetricSets into one before
validating against JSON schema.

In SDK version 3+ this will no longer be needed.

Refs #141

Signed-off-by: Mateusz Gozdek <mgozdek@microsoft.com>
For node condition metric, which may have "unknown" value, we agreed to
send -1 to the backend, so we do not lose this information.

Refs #138

Signed-off-by: Mateusz Gozdek <mgozdek@microsoft.com>
Closes #138

Signed-off-by: Mateusz Gozdek <mgozdek@microsoft.com>
Signed-off-by: Mateusz Gozdek <mgozdek@microsoft.com>
@invidian invidian force-pushed the invidian/add-node-status-metrics branch from fddced0 to 24b71d0 Compare July 20, 2021 08:11
@invidian
Copy link
Contributor Author

invidian commented Jul 20, 2021

Looking great, thanks @invidian!

@roobre thanks. I just pushed the change to e2e tests instead of how MetricSets are composed as we agreed. I think this is now ready to merge.

Copy link
Contributor

@gsanchezgavier gsanchezgavier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants