-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat: add distributed tracing support #4745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
178f982 to
22bfc0c
Compare
22bfc0c to
c1d9a3c
Compare
|
I'll implement the Prometheus notifier changes in a new draft PR based on prometheus/prometheus#16355 |
thompson-tomo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some feedback on attribute naming based on Open telemetry exposure in semconv. Might be worthwhile to add an alerting.platform.name attribute to the spans which stores alert manager.
Is there interest in having these signals defined in the open telemetry signal registry (semantic conventions)?
config/config.go
Outdated
| return json.Marshal(result) | ||
| } | ||
|
|
||
| // TODO: probably move these into prometheus/common since they're copied from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a good idea, what do you think @ArthurSens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can prepare a PR for common later this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not be against it :)
c1d9a3c to
7b93382
Compare
14b79bd to
47747ad
Compare
| trace.WithAttributes(attribute.String("alerting.notify.integration.name", i.name)), | ||
| trace.WithAttributes(attribute.Int("alerting.alerts.count", len(alerts))), | ||
| trace.WithSpanKind(trace.SpanKindClient), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an easy way to add additional integration details potential some from https://opentelemetry.io/docs/specs/semconv/http/http-spans/#http-client-span in particular server.*, note I assume this span represents the outbound call to a notification system hence the client span kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to also have a http span then perhaps this should be internal.
dispatch/dispatch.go
Outdated
| attribute.String("alerting.alert.name", alert.Name()), | ||
| attribute.String("alerting.alert.fingerprint", alert.Fingerprint().String()), | ||
| ), | ||
| trace.WithSpanKind(trace.SpanKindConsumer), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check this one
| trace.WithSpanKind(trace.SpanKindConsumer), | |
| trace.WithSpanKind(trace.SpanKindInternal), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed that one, are these outgoing calls as per the test mentioned in #4745 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is this case:
deferred execution (PRODUCER and CONSUMER spans).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but it still needs to be out going based on other information in the spec. Anyway I have raised open-telemetry/opentelemetry-specification#4758 to get further clarification.
c1482c7 to
5882778
Compare
|
CI failures will be fixed after #4761 is merged. |
6ba9be4 to
37231f9
Compare
OGKevin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor ✨comments
37231f9 to
bea0dba
Compare
Add tracing support using otel to the the following components: - api: extract trace and span IDs from request context - provider: mem put - dispatch: split logic and use better naming - inhibit: source and target traces, mutes, etc. drop metrics - silence: query, expire, mutes - notify: add distributed tracing support to stages and all http requests Note: inhibitor metrics are dropped since we have tracing now and they are not needed. We have not released any version with these metrics so we can drop them safely, this is not a breaking change. This change borrows part of the implementation from prometheus#3673 Fixes prometheus#3670 Signed-off-by: Dave Henderson <dhenderson@gmail.com> Signed-off-by: Siavash Safi <siavash@cloudflare.com>
bea0dba to
b65bdbf
Compare

Add tracing support using otel to the the following components:
Note: inhibitor metrics are dropped since we have tracing now and they
are not needed. We have not released any version with these metrics so
we can drop them safely, this is not a breaking change.
This change borrows part of the implementation from #3673
Fixes #3670
Signed-off-by: Dave Henderson dhenderson@gmail.com
Signed-off-by: Siavash Safi siavash@cloudflare.com
TODO list
tracingpackage and make changes/improvements if necessarywebhooknotifier and a test webhook receiver which is supports distributed tracingDemo
Alertmanager receiving alerts from a "sender" and tracing it all the way to an Aggregation Group inside Dispatcher
Note that Sender here is a custom Cloudflare proxy which acts as a limiter, it is included as an example to show distributed tracing in action

Trace of Dispatcher flushing alerts and notifications failing to be sent to PagerDuty