-
Notifications
You must be signed in to change notification settings - Fork 16
Introduce an interface to translate hostmetrics and add tests #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
axw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a high level review - looks good overall, main questions are whether we should be resilient to changes in the set of host metrics scraped, and whether this should be implementing the processor API directly rather than creating a new "remapper" abstraction.
remappers/hostmetrics/hostmetrics.go
Outdated
| // remapped metrics could be trivially converted into Elastic system metrics. | ||
| // The current remapping logic assumes that each Metric in the ScopeMetric | ||
| // will have datapoints for a single timestamp only. | ||
| func (r *Remapper) Remap(src pmetric.ScopeMetrics, out pmetric.MetricSlice) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for not implementing an opentelemetry-collector processor directly here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the full processor interface (ref)? -- I didn't do this because we didn't need to. The component.Component felt unnecessary complexity for our use-case.
The consumer interface (ref) was a better alternative but I decided against it because it would be too restrictive. My goal was to have an interface that can be used in a multiple ways and easily adapted to implement a consumer interface if needed - the reason why I separated the output pmetric.MetricSlice from the source's pmetric.ScopeMetrics. Let me know if it doesn't make sense, I can adopt the consumer interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What triggered me to ask is the assumption of using zap for logging. That is currently the case in opentelemetry-collector, but what if it changed? Then what if we wanted to add metrics (i.e. metrics measuring remapping)? So I was thinking maybe we should use the telemetry config struct and its logger.
It's not a big deal, and we can refactor as needed. I was wondering if there was something fundamentally preventing us - sounds like no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a big deal, and we can refactor as needed. I was wondering if there was something fundamentally preventing us - sounds like no.
Yeah, no blockers in doing that.
|
Putting this PR on draft after talking with @ishleenk17. Once #4 is merged I will rebase the PR and update the code. |
|
Apologies for the force push, I have updated the PR after the original code for the remappers was merged. It is ready for review now. |
ishleenk17
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Vishal!
I've left my initial comments on the PR. I'll be doing a more thorough review soon.
Also, let's change the title of the PR to something more appropriate.
| if !r.Valid(src) { | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metricset.period calculation from the migrated code has been left out here. Was that intentional ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not intentional actually. Can you elaborate on how the metricset.period is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I perused the actual usage here and I think this piece of code would better live outside the remappers (in the processor or in APM/MIS if required there) so that the interfaces don't get bloated. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added metricset.period because it's used in some of the UIs as a scale factor. i agree it's kinda clunky here - ideally you'd be able to read it from the receiver config (the scrape period) directly, but there's no way i could find to it.
we could add it in a downstream component, but i figured as it was a direct requirement of some of the elastic host UIs, it was ok to stick it here for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think the issue with doing it in library (or even in APM/MIS) would be that we would need to maintain a cache for all the hosts we see and calculate the metricset.period for each of it. @axw WDYT? Do you see a better way here?
but i figured as it was a direct requirement of some of the elastic host UIs, it was ok to stick it here for now
Do you have a list of UIs that break due to this? I checked the main inventory UI and that seems to work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the poc implementation we just stored a single value for the whole processor. this worked fine - it assumes that we only have one pmetrics struct per scrape (across all the datasets), which is currently true. could you think of a configuration which would break it?
@tommyers-elastic But in APM/MIS we would get data from more than one collector instances, so we would need to calculate the metricset.period for all such instances even if it is 1 value per instance. Did I answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lahsivjar : Lets open a ticket to discuss about metricset.period, as this should not block the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are dealing with cumulative sums (in network and process remappers) :(
Right, that's a pain. One option, which is a little brittle but probably not worse than other options, would be to extract the value from any of the non-cumulative sum metrics. All of the host metrics are scraped on the same interval IIANM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lahsivjar : Lets open a ticket to discuss about metricset.period, as this should not block the PR
The ticket for metricset.period.
tommyers-elastic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good
| if !r.Valid(src) { | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the data point's timestamp - start_timestamp to calculate metricset.period? For gauges and delta sums, the start timestamp should be the beginning of the collection interval. It doesn't look like we're dealing with cumulative sums in this code.
ishleenk17
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
shmsr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments.
| m.SetName(metric.name) | ||
|
|
||
| var dp pmetric.NumberDataPoint | ||
| switch metric.dataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't there be a default case to skip the metric if any other dataType comes other than MetricTypeGauge/ MetricTypeSum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code shows as diff here but it is same as what we have in main. IMO, the best approach would be to throw an error (or even panic here) since skipping would mean we have appended an empty metric to the slice with a name but didn't handle it -- this would only occur when there is a bug in the library and the tests don't catch it for both the cases. I will leave it as is for now and will open another PR to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will take it as part of next refactoring Subham.
Hostmetrics remappers remap metrics produced by hostmetricsreceiver to Elastic system metrics. This allows powering Elastic's curated UIs based on OTel metrics.
The code from the remappers is adopted from the PoC done here.