Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added host metrics cpu scraper #862

Merged
merged 1 commit into from
Apr 30, 2020

Conversation

james-bebbington
Copy link
Member

@james-bebbington james-bebbington commented Apr 23, 2020

Link to tracking Issue:
#847

Description:
Added cpu scraper to the hostmetricsreceiver which uses gopsutil to scrape CPU times. This can be configured to report on total cpu times, or provide a per cpu breakdown.

Not included in this PR:
The next PR will add cpu utilization metrics obtained via windows performance counters

@james-bebbington james-bebbington changed the title Initial commit of host metrics cpu scraper using gopsutil to collect … Added host metrics cpu scraper Apr 23, 2020
@james-bebbington james-bebbington force-pushed the hostmetrics-cpu branch 2 times, most recently from fb9c2f5 to 012a7b6 Compare April 23, 2020 00:40
@codecov-io
Copy link

codecov-io commented Apr 23, 2020

Codecov Report

Merging #862 into master will increase coverage by 0.54%.
The diff coverage is 96.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #862      +/-   ##
==========================================
+ Coverage   85.37%   85.92%   +0.54%     
==========================================
  Files         162      168       +6     
  Lines       12467    13052     +585     
==========================================
+ Hits        10644    11215     +571     
- Misses       1404     1415      +11     
- Partials      419      422       +3     
Impacted Files Coverage Δ
component/componenttest/error_waiting_host.go 73.33% <0.00%> (ø)
component/componenttest/nop_host.go 42.85% <0.00%> (ø)
processor/resourceprocessor/factory.go 87.50% <0.00%> (ø)
receiver/hostmetricsreceiver/factory.go 67.30% <58.33%> (+12.95%) ⬆️
...icsreceiver/internal/scraper/cpuscraper/factory.go 60.00% <60.00%> (ø)
...eceiver/internal/scraper/cpuscraper/cpu_scraper.go 83.05% <83.05%> (ø)
config/example_factories.go 84.67% <85.71%> (ø)
component/factory_helpers.go 92.85% <100.00%> (ø)
config/config.go 99.74% <100.00%> (ø)
config/configmodels/configmodels.go 95.83% <100.00%> (ø)
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7082daa...685311d. Read the comment docs.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for also testing our new internal API, I think I found 1-2 improvements that we can do.

@james-bebbington
Copy link
Member Author

Thanks for also testing our new internal API, I think I found 1-2 improvements that we can do.

No worries, it was more by mistake rather than intentional 😉

In general, from my point of view, the two things that were a bit annoying when coding to the API were:

  1. The lack of an Append function for most slice objects. We should probably add that for all slice objects as mentioned in Add an AddMetric (or similar) function to pdata/pdatautil package #864

  2. The lack of Set functions for struct/slice properties, e.g. Metric.SetDescriptor(..) or Metric.SetDataPoints(..) (or something like Metric.DataPoints().AppendAll(..) would be roughly equivalent). This means initializing / updating everything in place which feels a little unnatural. I believe this is what you're planning to resolve with the CopyTo functions you mentioned: Add a CopyTo function for all generated objects to pdatautil package #865


receiver, err := NewHostMetricsReceiver(context.Background(), zap.NewNop(), config, factories, sink)

if runtime.GOOS != "windows" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an error? The CPU metric seems to work on linux as well, or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code will work on any OS supported by gopsutil, but the set of "state" labels / values recorded is limited to ones available in Windows. In the next PR I'm intending to add a bunch more metrics that will only be available on Windows.

I figured it would be better to explicitly restrict this from running on non-Windows operating systems until functionality is properly built out for them rather than potentially confusing users by returning minimal / incomplete data. I don't mind too much either way though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about what state is missing.

Copy link
Member Author

@james-bebbington james-bebbington Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/shirou/gopsutil/blob/master/cpu/cpu_linux.go#L251 vs https://github.com/shirou/gopsutil/blob/master/cpu/cpu_windows.go#L160

Linux has other cpu states that are recorded such as Idle, Irq, Softirq, Guest .. that don't necessarily have direct equivalents on Windows (at least not provided by gopsutil). If we added these for Windows they would all have zero values.

It'll be a very minor change to add these extra states for non-Windows OSs only, but I've left that for a future PR as I'm focusing on Windows for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this in a followup PR as discussed.

receiver/hostmetricsreceiver/example_config.yaml Outdated Show resolved Hide resolved
IowaitStateLabelValue = "iowait"
)

var MetricCPUSecondsDescriptor = metricCPUSecondsDescriptor()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be exported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's exported for the host metrics receiver TestGatherMetrics_EndToEnd test to prevent having to hardcode this in two places.

It's not ideal, but I'm not sure if there's a better way to handle that? (other than redefining the descriptor for the test)

receiver/hostmetricsreceiver/hostmetrics_receiver.go Outdated Show resolved Hide resolved
select {
case <-ticker.C:
c.scrapeMetrics(ctx)
case <-ctx.Done():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not how we intend to close the pipeline. It is not guaranteed that we mark the start Context as done before we end. Can you please describe how you expect this Scraper to be closed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was loosely following the pattern Jay referenced here: #848 (comment) (see my comment below), and per this example: k8sclusterreceiver example, but given that the scrapers run within the context of the host metrics receiver, I extended it to so that the parent context (that is in the scope of the hostmetrics receiver) controls when the context is cancelled.

i.e. the hostmetrics receiver creates a new context.WithCancel(ctx) in its Start function (line 63) and calls the returned context's cancel function in its Shutdown function (line 78). The Scrapers are always started in the context of the host metrics receiver so are guaranteed to receive this signal.

Maybe creating that dependency between the parent context is not considered great practice in general? (although that seems to be what its inteded for). Regardless, it definitely adds confusion (and could potentially cause some kind of race condition) to both: a) listen to the parent context Done() channel, and b) have a Close() function that the parent is expected to call.

I've removed the creation of the cancelable context from host metrics receiver, and moved this down to the cpu scraper itself which uses ctx.Done() to signal the termination of the ticker (I could change it to just using a regular channel, but figure its basically the same)


receiver, err := NewHostMetricsReceiver(context.Background(), zap.NewNop(), config, factories, sink)

if runtime.GOOS != "windows" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about what state is missing.

Comment on lines 28 to 33
var (
UserStateLabelValue = "user"
SystemStateLabelValue = "system"
IdleStateLabelValue = "idle"
InterruptStateLabelValue = "interrupt"
IowaitStateLabelValue = "iowait"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have somewhere documented why do we go with labels for state vs a different metric per state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for that choice isn't explicitly documented, but we're working off requirements in this doc: Host Metrics - feel free to add comments.

I figure these are essentially different categories of cpu time, so make sense to be labels.

@james-bebbington james-bebbington force-pushed the hostmetrics-cpu branch 5 times, most recently from 9dffd72 to 685311d Compare April 26, 2020 03:02
go.mod Outdated Show resolved Hide resolved

receiver, err := NewHostMetricsReceiver(context.Background(), zap.NewNop(), config, factories, sink)

if runtime.GOOS != "windows" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this in a followup PR as discussed.

@bogdandrutu bogdandrutu merged commit 81b1f53 into open-telemetry:master Apr 30, 2020
@flands flands added this to the Beta 0.4 milestone Jun 4, 2020
wyTrivail pushed a commit to mxiamxia/opentelemetry-collector that referenced this pull request Jul 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants