Introduce an interface to translate hostmetrics and add tests #2

lahsivjar · 2024-05-13T13:28:37Z

Hostmetrics remappers remap metrics produced by hostmetricsreceiver to Elastic system metrics. This allows powering Elastic's curated UIs based on OTel metrics.

The code from the remappers is adopted from the PoC done here.

remappers/const.go

remappers/hostmetrics/config.go

axw

Just a high level review - looks good overall, main questions are whether we should be resilient to changes in the set of host metrics scraped, and whether this should be implementing the processor API directly rather than creating a new "remapper" abstraction.

remappers/hostmetrics/process.go

remappers/hostmetrics/hostmetrics.go

axw · 2024-05-14T06:08:03Z

remappers/hostmetrics/hostmetrics.go

+// remapped metrics could be trivially converted into Elastic system metrics.
+// The current remapping logic assumes that each Metric in the ScopeMetric
+// will have datapoints for a single timestamp only.
+func (r *Remapper) Remap(src pmetric.ScopeMetrics, out pmetric.MetricSlice) {


What is the reason for not implementing an opentelemetry-collector processor directly here?

Do you mean the full processor interface (ref)? -- I didn't do this because we didn't need to. The component.Component felt unnecessary complexity for our use-case.

The consumer interface (ref) was a better alternative but I decided against it because it would be too restrictive. My goal was to have an interface that can be used in a multiple ways and easily adapted to implement a consumer interface if needed - the reason why I separated the output pmetric.MetricSlice from the source's pmetric.ScopeMetrics. Let me know if it doesn't make sense, I can adopt the consumer interface.

What triggered me to ask is the assumption of using zap for logging. That is currently the case in opentelemetry-collector, but what if it changed? Then what if we wanted to add metrics (i.e. metrics measuring remapping)? So I was thinking maybe we should use the telemetry config struct and its logger.

It's not a big deal, and we can refactor as needed. I was wondering if there was something fundamentally preventing us - sounds like no.

It's not a big deal, and we can refactor as needed. I was wondering if there was something fundamentally preventing us - sounds like no.

Yeah, no blockers in doing that.

lahsivjar · 2024-05-14T11:47:52Z

Putting this PR on draft after talking with @ishleenk17. Once #4 is merged I will rebase the PR and update the code.

lahsivjar · 2024-05-14T14:18:16Z

Apologies for the force push, I have updated the PR after the original code for the remappers was merged. It is ready for review now.

ishleenk17

Thanks Vishal!
I've left my initial comments on the PR. I'll be doing a more thorough review soon.

Also, let's change the title of the PR to something more appropriate.

remappers/hostmetrics/cpu.go

remappers/hostmetrics/hostmetrics.go

ishleenk17 · 2024-05-15T10:41:55Z

remappers/hostmetrics/hostmetrics.go

+	if !r.Valid(src) {
+		return
+	}
+


The metricset.period calculation from the migrated code has been left out here. Was that intentional ?

Not intentional actually. Can you elaborate on how the metricset.period is used?

I perused the actual usage here and I think this piece of code would better live outside the remappers (in the processor or in APM/MIS if required there) so that the interfaces don't get bloated. Let me know what you think.

i added metricset.period because it's used in some of the UIs as a scale factor. i agree it's kinda clunky here - ideally you'd be able to read it from the receiver config (the scrape period) directly, but there's no way i could find to it.

we could add it in a downstream component, but i figured as it was a direct requirement of some of the elastic host UIs, it was ok to stick it here for now.

Hmm, I think the issue with doing it in library (or even in APM/MIS) would be that we would need to maintain a cache for all the hosts we see and calculate the metricset.period for each of it. @axw WDYT? Do you see a better way here?

but i figured as it was a direct requirement of some of the elastic host UIs, it was ok to stick it here for now

Do you have a list of UIs that break due to this? I checked the main inventory UI and that seems to work fine.

in the poc implementation we just stored a single value for the whole processor. this worked fine - it assumes that we only have one pmetrics struct per scrape (across all the datasets), which is currently true. could you think of a configuration which would break it?

@tommyers-elastic But in APM/MIS we would get data from more than one collector instances, so we would need to calculate the metricset.period for all such instances even if it is 1 value per instance. Did I answer your question?

@lahsivjar : Lets open a ticket to discuss about metricset.period, as this should not block the PR

We are dealing with cumulative sums (in network and process remappers) :(

Right, that's a pain. One option, which is a little brittle but probably not worse than other options, would be to extract the value from any of the non-cumulative sum metrics. All of the host metrics are scraped on the same interval IIANM.

@axw : I have created a separate issue for this discussion to unblock this PR.
Lets take it forward there.

@lahsivjar : Lets open a ticket to discuss about metricset.period, as this should not block the PR

The ticket for metricset.period.

remappers/hostmetrics/metric.go

remappers/hostmetrics/network.go

tommyers-elastic

looking good

remappers/hostmetrics/hostmetrics.go

axw · 2024-05-16T10:04:32Z

remappers/hostmetrics/hostmetrics.go

+	if !r.Valid(src) {
+		return
+	}
+


Could we use the data point's timestamp - start_timestamp to calculate metricset.period? For gauges and delta sums, the start timestamp should be the beginning of the collection interval. It doesn't look like we're dealing with cumulative sums in this code.

ishleenk17

Looks good!

shmsr

Left some minor comments.

shmsr · 2024-05-17T07:10:00Z

remappers/hostmetrics/metric.go

+		m.SetName(metric.name)
+
+		var dp pmetric.NumberDataPoint
+		switch metric.dataType {


shouldn't there be a default case to skip the metric if any other dataType comes other than MetricTypeGauge/ MetricTypeSum?

The code shows as diff here but it is same as what we have in main. IMO, the best approach would be to throw an error (or even panic here) since skipping would mean we have appended an empty metric to the slice with a name but didn't handle it -- this would only occur when there is a bug in the library and the tests don't catch it for both the cases. I will leave it as is for now and will open another PR to fix it.

We will take it as part of next refactoring Subham.

remappers/hostmetrics/network.go

Makefile

remappers/hostmetrics/network.go

remappers/hostmetrics/hostmetrics.go

lahsivjar requested a review from a team as a code owner May 13, 2024 13:28

lahsivjar force-pushed the hostmetrics branch from e4ce7bc to 7b107fd Compare May 13, 2024 13:33

lalit-satapathy requested review from ishleenk17 and tommyers-elastic May 13, 2024 13:44

lahsivjar commented May 13, 2024

View reviewed changes

remappers/const.go Outdated Show resolved Hide resolved

lahsivjar commented May 13, 2024

View reviewed changes

remappers/hostmetrics/config.go Show resolved Hide resolved

axw reviewed May 14, 2024

View reviewed changes

lahsivjar marked this pull request as draft May 14, 2024 11:47

lahsivjar force-pushed the hostmetrics branch from 7b107fd to 1ba7acb Compare May 14, 2024 13:57

Move hostmetrics to remappers

e7bf03c

lahsivjar force-pushed the hostmetrics branch from 1ba7acb to cc8244d Compare May 14, 2024 14:09

Introduce a remapping interface and refactor hostmetrics

91cb86d

lahsivjar force-pushed the hostmetrics branch from cc8244d to 91cb86d Compare May 14, 2024 14:16

lahsivjar marked this pull request as ready for review May 14, 2024 14:16

Fix comment

7126b01

lahsivjar closed this May 14, 2024

lahsivjar reopened this May 14, 2024

Fix mutator and test for processes

3799ab5

lahsivjar requested a review from axw May 15, 2024 10:24

ishleenk17 reviewed May 15, 2024

View reviewed changes

lahsivjar changed the title ~~Add hostmetrics remappers~~ Introduce an interface to translate hostmetrics and add tests May 15, 2024

lahsivjar added 5 commits May 15, 2024 12:19

Fix network scraper and add tests

7605f63

Use ecs event.provider instead of a custom label

b5a67ee

Add network to benchmarks

4a70866

Update docs

b7c3c37

Remove TODO

2301fab

tommyers-elastic approved these changes May 16, 2024

View reviewed changes

axw reviewed May 16, 2024

View reviewed changes

And comment for accepting resource attributes in remap

840065e

ishleenk17 approved these changes May 17, 2024

View reviewed changes

axw approved these changes May 17, 2024

View reviewed changes

This was referenced May 17, 2024

Process and Network Metrics: Cumulative Sum Temporality #8

Closed

Set the metricset.period #9

Open

shmsr reviewed May 17, 2024

View reviewed changes

Makefile Outdated Show resolved Hide resolved

shmsr reviewed May 17, 2024

View reviewed changes

remappers/hostmetrics/network.go Outdated Show resolved Hide resolved

shmsr reviewed May 17, 2024

View reviewed changes

remappers/hostmetrics/hostmetrics.go Outdated Show resolved Hide resolved

lahsivjar added 4 commits May 17, 2024 09:44

Refactor hostmetrics Remap

c8924e6

Update tool versions and fix typo

b02413e

Refactor network for better readability

d786b98

Refactor network direction attr handling

9ab6660

lahsivjar requested a review from shmsr May 17, 2024 09:00

ishleenk17 merged commit 161b0f7 into elastic:main May 17, 2024

lahsivjar deleted the hostmetrics branch May 17, 2024 09:03

Introduce an interface to translate hostmetrics and add tests #2

Introduce an interface to translate hostmetrics and add tests #2

Uh oh!

Conversation

lahsivjar commented May 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

axw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lahsivjar commented May 14, 2024

Uh oh!

lahsivjar commented May 14, 2024

Uh oh!

ishleenk17 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lahsivjar May 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tommyers-elastic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ishleenk17 left a comment

Choose a reason for hiding this comment

Uh oh!

shmsr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lahsivjar commented May 13, 2024 •

edited

Loading

lahsivjar May 15, 2024 •

edited

Loading