Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koordlet: add external prom metrics collector #1772

Conversation

saintube
Copy link
Member

@saintube saintube commented Dec 7, 2023

Ⅰ. Describe what this PR does

  • koordlet: support the prom collector. With a YAML configuration, it scrapes the prom metrics of external Prometheus exporters and stores them into the koordlet metric cache.

Ⅱ. Does this pull request fix one issue?

fixes #1735.

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign yihuifeng after the PR has been reviewed.
You can assign the PR to them by writing /assign @yihuifeng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@saintube saintube force-pushed the koordlet-add-external-metrics-collector branch from ede8520 to a4cb357 Compare December 8, 2023 04:23
Copy link

codecov bot commented Dec 8, 2023

Codecov Report

Attention: Patch coverage is 65.55423% with 289 lines in your changes are missing coverage. Please review.

Project coverage is 66.43%. Comparing base (7ca45d7) to head (38bb143).
Report is 106 commits behind head on main.

Files Patch % Lines
...koordlet/metricsadvisor/collectors/prom/scraper.go 73.66% 55 Missing and 29 partials ⚠️
...t/metricsadvisor/collectors/prom/prom_collector.go 55.81% 33 Missing and 5 partials ⚠️
...et/metricsadvisor/collectors/prom/scrape_client.go 71.59% 20 Missing and 5 partials ⚠️
...let/qosmanager/plugins/cpusuppress/cpu_suppress.go 20.00% 16 Missing ⚠️
pkg/koordlet/metrics/common.go 68.75% 10 Missing and 5 partials ⚠️
...let/metricsadvisor/collectors/prom/collect_rule.go 88.23% 8 Missing and 4 partials ⚠️
...collectors/noderesource/node_resource_collector.go 22.22% 5 Missing and 2 partials ⚠️
...visor/collectors/pagecache/page_cache_collector.go 46.15% 6 Missing and 1 partial ⚠️
...ollectors/sysresource/system_resource_collector.go 50.00% 5 Missing and 2 partials ⚠️
...let/qosmanager/plugins/memoryevict/memory_evict.go 25.00% 6 Missing ⚠️
... and 22 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1772      +/-   ##
==========================================
+ Coverage   66.42%   66.43%   +0.01%     
==========================================
  Files         396      401       +5     
  Lines       43721    44526     +805     
==========================================
+ Hits        29040    29581     +541     
- Misses      12529    12741     +212     
- Partials     2152     2204      +52     
Flag Coverage Δ
unittests 66.43% <65.55%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@saintube saintube force-pushed the koordlet-add-external-metrics-collector branch 2 times, most recently from 462c4fa to 7445733 Compare December 11, 2023 13:25
@saintube saintube force-pushed the koordlet-add-external-metrics-collector branch 2 times, most recently from 7d8f30d to 4a4e34f Compare December 12, 2023 08:17
pkg/koordlet/metrics/common.go Show resolved Hide resolved
pkg/koordlet/metricsadvisor/collectors/prom/scraper.go Outdated Show resolved Hide resolved
pkg/koordlet/metricsadvisor/framework/config.go Outdated Show resolved Hide resolved
if sample.Timestamp.UnixNano() <= 0 {
sample.Timestamp = timeNow
}
if sample.Timestamp.Sub(timeNow) > validTimeRange && sample.Timestamp.Sub(timeNow) < -validTimeRange {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the meaning of sample.Timestamp, no need to check if the ts is scrap time but not origin collect time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scrap time but not origin collect time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are possible. Though many prom exporters do not set the timestamp during collection, some exporters like the cadvisor expose the metrics with collecting timestamp via prometheus.NewMetricWithTimestamp.

pkg/koordlet/metrics/common.go Show resolved Hide resolved
@saintube saintube force-pushed the koordlet-add-external-metrics-collector branch 6 times, most recently from 38fd622 to 3c7d3ae Compare December 20, 2023 03:10
@saintube saintube changed the title [WIP] koordlet: add external prom metrics collector koordlet: add external prom metrics collector Dec 20, 2023
@saintube saintube force-pushed the koordlet-add-external-metrics-collector branch from 3c7d3ae to b017f8a Compare December 22, 2023 02:28
@@ -62,4 +64,16 @@ func (c *Config) InitFlags(fs *flag.FlagSet) {
fs.DurationVar(&c.CPICollectorTimeWindow, "collect-cpi-timewindow", c.CPICollectorTimeWindow, "Collect cpi time window. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h).")
fs.DurationVar(&c.ColdPageCollectorInterval, "coldpage-collector-interval", c.ColdPageCollectorInterval, "Collect cold page interval. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h).")
fs.BoolVar(&c.EnablePageCacheCollector, "enable-pagecache-collector", c.EnablePageCacheCollector, "Enable cache collector of node, pods and containers")
fs.StringVar(&c.CollectPromMetricRulePath, "collect-prom-metric-rule-path", c.CollectPromMetricRulePath, "Collect prometheus metrics rule path. The prometheus collector is disabled when the path is empty.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave an issue, consider get metrics rule from koord-manager HTTP API, so that no need to restart koordlet if config changed

pkg/koordlet/metriccache/metric_types.go Outdated Show resolved Hide resolved
if sample.Timestamp.UnixNano() <= 0 {
sample.Timestamp = timeNow
}
if sample.Timestamp.Sub(timeNow) > validTimeRange && sample.Timestamp.Sub(timeNow) < -validTimeRange {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scrap time but not origin collect time?

Signed-off-by: saintube <saintube@foxmail.com>
@saintube saintube force-pushed the koordlet-add-external-metrics-collector branch from b017f8a to 38bb143 Compare December 25, 2023 04:02
Copy link

stale bot commented Mar 24, 2024

This issue has been automatically marked as stale because it has not had recent activity.
This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed
    You can:
  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close
    Thank you for your contributions.

Copy link

stale bot commented Apr 23, 2024

This issue has been automatically closed because it has not had recent activity.
This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed
    You can:
  • Reopen this PR with /reopen
    Thank you for your contributions.

@stale stale bot closed this Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[proposal] koordlet supports external metrics
2 participants