-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(rfcs): Add RFC for Apache HTTP Server metrics source #3519
Conversation
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
- Do we want to apply any metric labels based on the other information | ||
available via the status page? I could see labeling the `url` at least |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But I assume #2660 would solve this for both logs and metrics, so it's probably blocked on that. Agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think that is just for internal metrics, yes? It looks like the prometheus source adds labels (looks like we call them tags):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I do think many of these event attributes would be better represented as trace context, but we can defer that for a later discussion.
- Are there preferences between `apache` or `httpd` for the nomenclature? I | ||
feel like `apache` is more well-known though `httpd` is more accurate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer httpd
or apache_httpd
? Apache feels a little generic since I think of the entire foundation. If we go with the apache_
prefix we should think about apache_kafka
and so on :/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had the same angst about using apache
, but it does match how the other two collectors I looked at, datadog and telegraf, do it so people may be more familiar with that keyword as the component name if they are coming from those tools.
I'm open to either of those options though. I guess I prefer httpd
over apache_httpd
given that kafka
is just kafka
and not apache_kafka
.
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
later if this does not end up being the case and they really are all the same. | ||
|
||
One downside of this is that I think it'd be less discoverable than a | ||
first-class source for each type of endpoint we support scraping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with this.
I can also see us (at some point) having "aliased" components, that merely defer to some other generic component with a default set of configurations.
In this case, we'd have a http_scrape
source that is highly configurable, and a apache
alias source, that makes it more discoverable and easier to configure but defers its actual operations to the http_scrape
source.
Having said that; all of this is irrelevant for this initial source implementation, which I like 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is actually what you described in "Future Work", isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The future work is just to do this internally, but I do like your idea of eventually adding an http_scrape
source that would be more flexible and allow people to scrape arbitrary endpoints. I imagine it might require more configuration which the "aliases" would simply default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also see us (at some point) having "aliased" components, that merely defer to some other generic component with a default set of configurations.
We do this currently. Sources and sinks can wrap other sources and sinks. For example, the syslog
source wraps the socket
source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this precedence, do we want to create a generic http_scrape
source and the apache
source based on this RFC, or do we want to start with the latter, and transition it to the former as soon as we need another source based on HTTP scraping?
I assume we'll just start with the singular apache
source, but want to make sure we're on the same page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd like to defer the http_scrape
source as I think it'll require more thought and discussion about how to parse the incoming requests (formats? codecs? etc.).
I guess I figured I'd start with this one and the nginx one (#3091) for now and refactor them internally to share a component (along with the prometheus
source).
later if this does not end up being the case and they really are all the same. | ||
|
||
One downside of this is that I think it'd be less discoverable than a | ||
first-class source for each type of endpoint we support scraping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is actually what you described in "Future Work", isn't it?
- `response_timeout`; to cap request lengths | ||
- `tls`: settings to allow setting specific chains of trust and client certs | ||
|
||
But I chose to leave those out for now given the prometheous source doesn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the big missing piece here is Basic auth - a lot of people still use this as security for mod_status. Prom just adds it to the scrape URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think we could support this in the same way (via the URL), but that it'd be useful to add basic auth options for all HTTP-based client sources. I propose deferring that until after we refactor them as mentioned in "Future Work".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - missed that. +1.
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
@jamtur01 @JeanMertz @binarylogic I believe I've addressed all of the comments here. Given you all three have commented extensively, I figured you'd be the best to target for formal RFC approval reviews (step 3) so I rerequested reviews from you three 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Answer: yes, the config has been updated to allow multiple endpoints. | ||
- Are there preferences between `apache` or `httpd` for the nomenclature? I | ||
feel like `apache` is more well-known though `httpd` is more accurate. Answer: | ||
standardize on `apache`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting, we'll want apache_metrics
for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think that calling it apache_metrics
rather than just apache
, while making it clearer that the source ingests metrics, also makes it stand out against the other sources which are not suffixed with their data type. For example, we have statsd
rather than statsd_metrics
and docker
rather than docker_logs
. One could argue that statsd
is less ambiguously metrics, but docker
, in my mind, could be ambiguous: it could be either docker metrics or docker logs (or both).
This is actually standing out to me as something where the precedent will stick around for a while so it's probably worth having some explicit guidance for source naming. Sources could be one of, more than one of, or all of metrics, logs, and traces. It seems like the vision of vector
is to treat all observability data (logs, metrics, and traces) as first-class and so it seems like we should reflect that in the component naming and terminology.
I actually kinda like Datadog's model where they just have one apache
integration that ingests both metrics and logs from Apache. They configure it via a separate log
section: https://github.com/DataDog/integrations-core/blob/master/apache/datadog_checks/apache/data/conf.yaml.example#L326-L347 . Given, Datadag started with metrics and moved to logs, which I think is reflected in the fact that logs is a separate subsection rather than metrics being so. For us, we might want to have metrics/log/trace configuration at the same level. For users forwarding to datadog, they may want to have similar functionality as what dd-agent currently supports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points.
or example, we have statsd rather than statsd_metrics and docker rather than docker_logs
Then we should rename these sources to statsd_metrics
and docker_logs
or change our model.
Sources could be one of, more than one of, or all of metrics logs and traces. It seems like the vision of vector is to treat all observability data (logs, metrics, and traces) as first-class and so it seems like we should reflect that in the component naming and terminology.
It's a good point. From a UX perspective, drawing lines around components based on types makes it easier to build pipelines. For example, what happens if you connected a multi-type docker
source to regex_parser
that only operates on logs? Setting aside the fact that we can't technically do this, I think it complicates the UX. I am very happy to discuss this change in a separate issue, and then an RFC if we deem it worthy. Would also like other people's thoughts on the matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been going back and forth on this all afternoon. On one hand, having a single source name, docker/apache/etc, references that single point of observability. But I also started thinking "are there circumstances in which I'd only collect metrics from a service and I'd find apache
on its own confusing?" Also, I wonder whether having differently named sources might make pipeline configurations more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I wonder whether having differently named sources might make pipeline configurations more clear?
This is the main concern for me. Combined sources may seem convenient, but how often do you want to route logs and metrics through the same pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main concern for me. Combined sources may seem convenient, but how often do you want to route logs and metrics through the same pipeline?
It is a good question. I note that we do have a one source (vector
), a few sinks (vector
, console
, blackhole
), and a few transforms (filter
, lua
remap
(soon) that can handle both logs and metrics, but I can imagine it would be common to need to transform them differently.
I realize this is going off on quite a tangent; we could break this discussion off to a separate issue. I think it is important though.
I currently see 3 approaches:
- Call this
apache_metrics
and move towards renaming all sources / sinks that only handle a single type to have a suffix (likesematext_logs
). If/when we addtrace
as another type, we'd add things likesematex_traces
. - Move towards one component for each "integration type' for sinks/sources and do what I describe below to control the flow of event types through the pipeline when needed.
- Keep doing what we are doing which seems like a mix of strategies with respect to a type handling; defining some sort of "test" to determine when the type suffix should be applied.
I find the current state to be a little confusing. Some sources/sinks integrations can handle multiple metric types (vector
, console
) and other ones are split into _metrics
and _logs
(like datadog_logs
/datadog_metrics
). The clearest rule I can pull out for deciding is "can the integration handle pulling or pushing both logs and metrics with the same configuration? If yes, no suffix, if no, two components with _logs
and _metrics
". On issue with that rule is that downstream support for observaliblity types can change, making it difficult to predict if they will add metrics
or logs
support in the future.
I'd personally would appreciate some guidance laid out for component naming to avoid this question cropping up in the future.
As a thought experiment, maybe it is useful to imagining we add metrics support for clickhouse (#3435). Where would that go? A new component (clickhouse_metrics
)? If so, would we rename clickhouse
to clickhouse_logs
? Or would we add it as part of the existing clickhouse
sink?
Expanding on approach 2:
One way we could approach that with a single source, is allow transforms and sinks to ask for only certain datatypes with something like:
[transforms.regex]
type = "regex_parser"
inputs = ["mysource.logs"]
Where it would only receive logs from the source and not metrics (or traces). If there was no suffix, it would simply receive both. In the case of a transform that only works with a subset of "observability types" we could error at start-up if there is an invalid pipeline.
Another way to limit the event types if you only want certain types from the source would be something like:
[sources.apache]
type = "apache"
event_types = ["log", "metric"]
Perhaps with syntax sugar for one type:
[sources.apache]
type = "apache.logs"
I think there is something attractive about one component to observe one "integration" (like apache, nginx, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clearest rule I can pull out for deciding is "can the integration handle pulling or pushing both logs and metrics with the same configuration?
I agree we should have a stronger convention here and this is a good start. Right now, we handle multi-type sources by automatically inserting a type filter where necessary in front of downstream components. We should make sure that doesn't lead to confusing scenarios if we start using it more widely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, we handle multi-type sources by automatically inserting a type filter where necessary in front of downstream components.
Interesting, I wasn't aware of that. I could see that be a bit confusing if it is implicit.
As for the convention, I'm ok with starting with that one, but it does mean we'll need to deprecate sources/sinks occasionally. I'm imagining, had vector existed a few years ago, before Datadog had support for logs, we may have just called the sink datadog
and then had to deprecated it later in-lieu of datadog_metrics
when datadog_logs
was added.
I'm ok with just calling this apache_metrics
for now since, prompted by this discussion, I've observed we already have a mix of suffixed and unsuffixed sinks and sources so this doesn't seem to make the situation noticeably worse and my concern about precedent is unfounded. I can create a new issue to come up with a convention for component naming. I'm still curious to hear thoughts on separate vs. integrated sources/sinks as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can create a new issue to come up with a convention for component naming. I'm still curious to hear thoughts on separate vs. integrated sources/sinks as well.
👍 it's worth a discussion. I'm not leaning strongly in either direction, but as you can see there are scenarios we should discuss before making a decision. I like the simplicity of single, multi-type components from a documentation standpoint, but I want to make sure it doesn't over complicate the actual UX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the apache_metrics
name, everything looks good.
Answer: yes, the config has been updated to allow multiple endpoints. | ||
- Are there preferences between `apache` or `httpd` for the nomenclature? I | ||
feel like `apache` is more well-known though `httpd` is more accurate. Answer: | ||
standardize on `apache`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I wonder whether having differently named sources might make pipeline configurations more clear?
This is the main concern for me. Combined sources may seem convenient, but how often do you want to route logs and metrics through the same pipeline?
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Before we merge this, I think it's worth, one last time, considering the Prometheus style naming (#3581 (comment)) since this will be very difficult to change later. Calling this "Prometheus style" is a little misleading since it's just good metric naming practices. From what I can tell there is nothing Prometheus specific about it. |
I should have included my opinon 😄 . I think we should follow the Prometheus style for a number of reasons:
|
Notably, he is matching the node exporter naming, but it looks like Telegraf actually exposes apache metrics with a different style (which I assume most people pushing metrics into prometheus would be using):
So to get the benefit of users being able to use preconfigured dashboards, we'd have to match that. |
I know this looks a lot like bikeshedding, but I think it has much broader implications than we think, especially as we set precedence, so it is worth discussing and obtaining consensus.
Sure. I don't think we should try to solve this at the source-level though. We should shift this concern to the sink-level. Our logging components have the same problem, it's why I opened https://github.com/timberio/vector-product/issues/34. In my opinion, this is more about the style we want to adopt.
What makes you think that instead of the Promtheus apache_exporter? I still lean towards Prometheus' style. It's cleaner, semantically better, and proven by a large community. |
I agree, what we do here will set a precedent. I appreciate you pushing on this as I do think it is important to attempt to get right to avoid an annoying shift later. I had missed https://github.com/Lusitaniae/apache_exporter ; I guess that is semiofficial since it is the only one listed on https://prometheus.io/docs/instrumenting/exporters/ . If I were picking, I'd probably have gone with telegraf though due to broader support.
Do want to exactly match the interface of https://github.com/Lusitaniae/apache_exporter ? Or just the general naming conventions? If we do wanto to match, a corollary question: do you want to try to match the prometheus exporters (if one exists) for all metric ingestion in I do actually like that they rely on labels more heavily; I think I'd like to adopt that pattern regardless. Example export from
Some notes on theirs:
I guess my inclination would be to adopt prometheus naming conventions, but not necessarily exactly match existing prometheus exporters unless they are "Official" exporters or we find that we particularly like the existing one as the non-official ones seem to have varying levels of support. |
👍 Agree. Again, we're going to have to solve mapping at some point. Given that everyone is different, we should focus on building our own quality catalog with consistency and provide migration ramps later. The ultimate fallback is #3463, where users can remap the metrics right in the source to their own liking. |
👍 I'll update this RFC with prometheus-style naming and ping for another round of 👀 |
And prefer labels over separate metrics for related metrics. Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
I pushed 7426b37 which switches this to use the prometheus metric naming conventions. Other differences are:
This did raise the question for me: how this will play with the @jamtur01 @binarylogic @lukesteensen (and anyone else!) do you mind giving 7426b37 a look? |
Nice work
Agree. #3609 |
|
||
I figure we probably don't want metrics for: | ||
|
||
- Load (should be handled by a `cpu` or similar metrics source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have this metric listed above. Do you still want to keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one is actually different, I believe it is just the CPU load of the httpd process as opposed to the entire system load (which is also available via mod_status
). I think the general system load would be better handled by the host_metrics
source.
I wish they had better docs for mod_status
, but that was my interpretation based on experiments and looking at https://github.com/apache/httpd/blob/0a172a81ec8b9d280349da1dec455a2dba287ace/modules/generators/mod_status.c#L476-L508
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct from memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(perhaps rename this to System Load
to make this less ambiguous)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work. This will help set precedence for future integrations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @jszwedko 👍
I'm also happy to see some other discussions this triggered, such as consistent naming of metrics and components.
|
||
I figure we probably don't want metrics for: | ||
|
||
- Load (should be handled by a `cpu` or similar metrics source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(perhaps rename this to System Load
to make this less ambiguous)
later if this does not end up being the case and they really are all the same. | ||
|
||
One downside of this is that I think it'd be less discoverable than a | ||
first-class source for each type of endpoint we support scraping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this precedence, do we want to create a generic http_scrape
source and the apache
source based on this RFC, or do we want to start with the latter, and transition it to the former as soon as we need another source based on HTTP scraping?
I assume we'll just start with the singular apache
source, but want to make sure we're on the same page.
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
…dev#3519) * chore(rfcs): Add RFC for Apache HTTP Server metrics source Signed-off-by: Jesse Szwedko <jesse@szwedko.me> Signed-off-by: Brian Menges <brian.menges@anaplan.com>
#3092
Signed-off-by: Jesse Szwedko jesse@szwedko.me