Flow: use Prometheus directly instead of abstraction #2037

rfratto · 2022-08-14T15:19:54Z

Our Prometheus components (prometheus.scrape, prometheus.mutate, prometheus.remote_write) should behave exactly as Prometheus does.

Currently, Flow uses an abstraction for its metrics pipeline:

A prometheus.Receiver receives samples at a given timestamp
A prometheus.FlowMetric represents an individual sample

This abstraction is insufficient in its current state:

It does yet not support writing exemplars
It will not immediately support the upcoming work to write metadata to the WAL
It will not immediately support the upcoming sparse histogram work

It also has behavioral differences:

Flow: Discussion on scraper success/error #2035 notes that failing to append metrics don't fail the scrape, which may cause up != 1 alerts to no longer fire.

The list of behavioral differences may or may not be complete: abstractions introduce uncertainty, where it is harder to know if there is any other documented or undocumented behaviors of the underlying code which the abstraction does not satisfy.

For all of the issues above to be resolved, the abstraction would have to be changed to become increasingly Prometheus-like. Instead of maintaining an abstraction over time to match the behavior of a moving target, it would be easier to remove the abstraction and use Prometheus code directly.

As we've seen with operators and Helm charts, good abstractions are hard to make, and should only be attempted after accumulating a ton of experience with all the things it is abstracting over. I would recommend we write components to use their native APIs instead of trying to introduce abstractions. This means that Prometheus components should use Prometheus code, and OpenTelemetry Collector components should use OpenTelemetry Collector code.

NOTE: I don't mean to say that we can't wrap around Prometheus code to reduce boilerplate. Rather, we should not create an intermediate representation for metrics through a pipeline which has to be translated to/from native APIs.

If components use their native APIs, then a shim is needed to move between APIs. This has similar problems to an abstraction, but it reduces the surface area of where bugs can be introduced, allowing the native code path to always work fully as intended. How these shims would be exposed to users is out of scope of this proposal.

The text was updated successfully, but these errors were encountered:

mattdurham · 2022-08-24T16:25:44Z

I am more concerned with the behavior of the datatypes than the actual implementation. I agree that I haven't pointed out explicitly where the current behavior deviates from the prometheus scraping path. I believe as we do more components and functionality, the pipeline behavior will diverge. This divergence should be explicit instead of implicit though.

For instance, reusing the fanout appender is likely not a good appender to reuse. It has the concept of returning early and primary vs secondary. We would likely need to create our own that would run all the children and then return errors. I also believe the calling of children should in general be asynchronous deeply nested, or wide fan outs have a significant performance penalty.

I do think there is value in having abstractions to prevent mistakes and if we need to convert between formats. But this can be solved in other ways.

mattdurham · 2022-08-24T17:51:55Z

Discussion around this topic leads towards removing the FlowMetric concept and replacing it with stock prometheus, but transforming the metrics.receiver into an appendable/appender interface. Noting the non-obvious change of returning an error to the scrape components append/commit/rollback only if all descendants return an error.

The reason for this is if you have two remote_writes and one is succeeding, and one is failing, you want the metrics still flowing to the good remote_write. The current scrape would break the scrape loop on the first error.

mattdurham · 2022-08-25T19:42:01Z

I started implementing this today, and I encountered one of the reasons why we went with the payload approach. Each component gets more complex because it has to handle multiple concurrent appenders. One alternative is components are a shim over the structs that are appenders themselves. Ie mutate.metrics's only job is to spawn a sub-mutate appenders and each sub-mutate does the actual work.

Each component also needs to implement storage.appendable interface also. I will continue the change and once done we can evaluate the differences.

rfratto added proposal Proposal or RFC flow labels Aug 14, 2022

rfratto mentioned this issue Aug 23, 2022

Flow: naming convention for components #2059

Closed

rfratto added this to Grafana Agent (Public) Oct 3, 2022

rfratto added area/signals and removed flow labels Oct 3, 2022

rfratto added this to the v0.30.0 milestone Oct 4, 2022

rfratto mentioned this issue Oct 24, 2022

Grafana Agent Flow doesn't support Prometheus exemplars #2420

Closed

rfratto mentioned this issue Oct 31, 2022

Convert to using prometheus appendable #2431

Merged

3 tasks

mattdurham closed this as completed in #2431 Nov 1, 2022

mattdurham moved this to Done in Grafana Agent (Public) Nov 1, 2022

rfratto modified the milestones: v0.30.0, v0.29.0 Nov 3, 2022

github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 22, 2024

github-actions bot locked as resolved and limited conversation to collaborators Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flow: use Prometheus directly instead of abstraction #2037

Flow: use Prometheus directly instead of abstraction #2037

rfratto commented Aug 14, 2022 •

edited

Loading

mattdurham commented Aug 24, 2022

mattdurham commented Aug 24, 2022

mattdurham commented Aug 25, 2022

Flow: use Prometheus directly instead of abstraction #2037

Flow: use Prometheus directly instead of abstraction #2037

Comments

rfratto commented Aug 14, 2022 • edited Loading

mattdurham commented Aug 24, 2022

mattdurham commented Aug 24, 2022

mattdurham commented Aug 25, 2022

rfratto commented Aug 14, 2022 •

edited

Loading