-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ActivitySource to start producer and consumer activities (spans) to support OpenTelemetry observability #2460
base: main
Are you sure you want to change the base?
Conversation
e393a38
to
63d1efa
Compare
63d1efa
to
3e726e5
Compare
@odinserj should I target this at the Main or Dev branch? Thanks. |
f9f305b
to
db9e404
Compare
Hi @sgryphon! I like the idea, but the number of changes is really high. How does it compare with the following gist that's implemented as a job filter? https://gist.github.com/odinserj/06e18950b5bf3083a5aed0ed06d3d18a |
Thanks for reviewing. I am currently use a similar (but limited) filter, as discussed in #2017 I do think that activity tracing, like metrics, is better as part of the core solution, turned on by default / with a standard name. I based some of that on MassTransit, where you just subscribe to a known event source; the .NET elements (e.g. HttpClient) are similar in that you can just easily enable them. Decoupling from the flow is however a good idea, so happy with a different approach. My main thoughts are:
Apart from that, some specific feedback on the gist:
If you can answer the first two items, I'd be happy to work on a revised solution using the filter. |
I found I will work on using the alternative approach. It is nicer, as it neatly decouples the work. I also like putting the activity in the context (doesn't work for job parameters, which may be serialized to another machine, but I presume the start and end events run in the same process). |
db9e404
to
c7a1681
Compare
I've re-written the code as a global filter. I think it is much nicer to decouple like that -- and is shows the good architecture/structure of Hangfire that it is easy to use. I used your filter code as a base, and then merged with my code (& applied my suggestions). I added it as default to global filters, and then checked that the samples run okay (updated readme & samples for new location of the constant). Some notes:
The one thing I haven't updated yet is the unit tests, as the previous code was built into Worker, etc so I had tests for those. I'll have a look at the tests for other filters and see if I can add anything valuable. |
New unit tests, for the filter, now added. |
Addresses tracing aspects of HangfireIO#2408 for integration with Aspire, as well as all other OpenTelemetery based diagnostics, and addresses HangfireIO#2017. Add a default filter to start producer activities (spans) when jobs created, and consumer activities when jobs performed. Pass the creation context through as TraceParent and TraceState job parameters, so that distributed tracing works across job scheduling. Note that activity supports is only from netstandard2.0 onwards, and only creates activities if there is a configured listener.
Update NetCoreSample with an OpenTelemetry based listener to enable the Hangfire activity source. Add initial activity creation along with logging of TraceId to show job correlation in log output. Add background job examples, including error examples.
For activity tracing.
cbd3534
to
3925ba9
Compare
Thanks @sgryphon, looks really good! One caveat here is related to a new dependency included in Hangfire.Core, and it's version is deprecated as per NuGet. Given .NET 9.0 became strict against transitive dependencies (as in #2468), I'm thinking on the following steps:
So on the modern platforms we'll have out-of-box support for telemetry without deprecation-related problems, for other platforms it will be still possible to use https://www.nuget.org/packages/OpenTelemetry.Instrumentation.Hangfire. What do you think? |
OpenTelemetry is a cross-platform open-source standard for distributed tracing, which allows you to collect and analyze data about the performance of your systems.
OpenTelemetry is now the default for new .NET applications: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/observability-with-otel
These changes add a Hangfire ActivitySource to start a Producer activity (span) when a job (either background or recurring) is created, and then a Consumer activity when it is processed.
It addresses part of the integration to Aspire, requested in #2408
It also provides a built in core solution, compared to using filters discussed in #2017
The creation context information is persisted with the job, so that the distributed activities can be correlated, even if processed on a different server.
No library user involvement or changes are needed except to register a listener for the ActivitySource, such as via OpenTelemetry. The Producer activity will pick up any existing context (such as from an ASP.NET request), or create a new one as needed.
The NetCore example has been updated to show how TraceId is correlated between job creation and execution. The PR also includes unit tests and a brief mention in the Readme.
Note: Some of the work has been based on the MassTransit implementation of ActivitySource.