-
Notifications
You must be signed in to change notification settings - Fork 245
Add support for reporting application metrics via logging #708
Comments
PR of initial implementation coming very soon! |
Thanks for the ping @anurse! Looks good from a Serilog perspective, AFAICT. I've done some work with InfluxDB in this area; the single numeric value sounds fine, but the lack of support for key/value tags seems like it will limit the API's usefulness with Influx (not sure about other platforms). In scenario terms -
|
Yeah, I came to the same conclusion in my explorations with Influx. I think the pattern I'm leaning towards is allowing key-value pairs in |
Properties/Scopes are a tricky one to solve well.
I'd lean towards supporting arbitrary properties, but discouraging providers from reading Scope data (we can't stop them from reading it ;)). It does mean that if I want to attach some arbitrary property to an EF metric (for example), I can't do it unless EF exposes a way to do it. Perhaps that's OK for now. |
Are metrics really a logging concern? If I were interested in tracking application metrics would it make sense to have to take a dependency on Microsoft.Extensions.Logging? Have you guys looked at App Metrics for example? This reminds me of when I noticed Health Checking functionality being added. Once I noticed Health Checks being implemented by Microsoft I went to efforts to abstract that out of App Metrics, I don't see any progress on https://github.com/dotnet-architecture/HealthChecks anymore? |
@anurse a while ago, I added Telemetry APIs to Orleans (dotnet/orleans#992) which essentially add support to an abstraction that makes both trace/logging and metrics tracking mostly inspired on AI and NewRelic APIs. Recently we removed the log/tracing from those APIs in favor of use Microsoft.Extensions.Logging abstractions so we would leverage most of the packages (like Serilog) that were already with its plugins for those abstractions. Now, I'm going to revamp the telemetry APIs v2 and someone from the Orleans core team pointed me this issue out and would be good if we could use those metrics abstractions but I have some concerns about the way it is being implemented:
I would be interested in use it in Orleans but the way it is, it would limit us a lot. |
And here I am asked to remove my dependency on I'm not sure why Microsoft is interested in including their own implementation to report metrics to OSS TSDBs like InfluxDB when they already have AppInsights? There doesn't seem to be much support from Microsoft in terms of OSS, wasn't this the idea of the dotnet foundation, rather than including everything possible in the dotnet framework? App Metrics is my first attempt at an OSS project (other then contributing to other projects), rather than support from Microsoft, equivalent implementations are underway, Metrics and Health Checks. There are also other examples of this type of thing .NET developers have raised recently and in the past. Microsoft I think has done amazing things all-round recently, very impressed, but the above makes me loose motivation in contributing to the dotnet community :( |
@alhardy we haven't decided to anything just yet! That's why we've logged this issue: to invite community discussion! |
Here is my thoughts on this as someone who works in a very large corporate environment that has large logging as well as large telegraf/influxdb setups.
Make it so you can fire any metric and make the metric name a "namespace" effectively. So your metric maybe something like
Or something like that. Then it's really up to the adapters to make it efficient and smart. In fact if you make the "key" just opaque bytes then you can deal with that however and cache those values. Anyway just my 2p but this is a very important topic and one that I see as a weak point of c# aspnetcore apps in a big mixed Java/Go/C# ecosystem so I am happy this is being discussed! |
@Drawaes Yeah OK good points, in particular 1 and 3 👍 |
As always, I'm adding my rant that you should sit together with Vance and the corefx team to not come up with yet another solution for very similar things. E.g. metrics for operation-based things (like incoming / outgoing http requests) are already covered by Don't make us implement yet another abstraction in our frameworks/libraries - 283 lines of instrumentation code (as in the incoming ASP.NET Core request Diagnostics class) is enough. :) See https://github.com/dotnet/corefx/issues/18114 for more context. Give People who need other kinds of metrics can always use e.g. AppMetrics which offers much more functionality anyway. By introducing such a simple metrics abstraction in Microsoft.Extensions.*, you're just creating more Microsoft-internal competition and you're probably killing a great OSS project. |
@cwe1ss how would you suggest we implement exposing the current connection count in SignalR and the number of websocket messages sent per second? |
We're best buddies with @vancem 😄 |
I don't know how that's implemented but if you had one activity per connection, subscribers could maybe track the Start and Stop events and do the calculation themselves?!
… Am 20.09.2017 um 16:27 schrieb David Fowler ***@***.***>:
@cwe1ss how would you suggest we implement exposing the current connection count in SignalR and the number of websocket messages sent per second?
|
(This would only cover the Connection count though). |
@cwe1ss an activity to count the number of concurrent connections? I understand you want a single abstraction but that's just a hack right? The reason that metric system take numbers is because they are counting things. Trying to shoehorn everything into the same API makes a mediocre experience for producers and consumers as we'll end up inventing hidden semantics that can't be expressed in the API. |
no, the Activity is used for the instrumentation side to indicate an operation - nothing more, nothing less. Why should I as a framework/library developer care if you want to use something like Zipkin vs something like Prometheus? It's up to the consumers to decide what to do with it (in the DiagnosticSource events) - e.g. Zipkin might create a trace/span, Prometheus might create multiple counters (total, per country, whatever, ...), ... if creating an activity and running through the subscribers is too much overhead (maybe for msgs in a connection, but surely not for the connection itself, right?) then why not use existing stuff like EventCounter/PerfCounter. I'm not saying that I like Activity/EventCounter/... 100% but it's what's there in corefx - and it's brand new - so why invent something else?? We've had this discussion before so we probably just have to agree to disagree - just wanted to place my rant again so that I can tell my grandkids one day that it's not my fault that they have to use 20 different Diagnostics things in .NET 😀 |
PS: I'm not familiar with web sockets. Is the connection duration also an interesting metric? By instrumenting it with Activity you would get that automatically as well. |
It's not at all clear why I can see that it would make sense for the Microsoft Application Insights® Yes, there is a degree of commonality regarding points 1 and 2 in @anurse's list (type/namespace bits and so on) but there's a world of difference outside of there. |
@anurse |
Would a Counter metric work by doing Related: is there a case to be made for having an on-interface |
I don't like the define... what if I define it 2x, what if I don't define it before I send a metric? Also I tend to agree that logging and metrics a separate.. .. like @markrendle called it, we are Kafka/ splunk - telegraf/ influxdb |
I'm guessing that once the metric is defined the first time, it hangs around and gets re-used like Loggers, although that appears to be up to the implementer of IMetricLogger. And I guess that the baked-in meta-implementation of Logger would do that? Something like private readonly ConcurrentDictionary<string, IMetric> _metrics;
public IMetric DefineMetric(string name, params (string key, string value)[] tags)
{
return _metrics.GetOrAdd(name, n => CreateNewMetric(n, tags));
}
private IMetric CreateNewMetric(string name, params (string key, string value)[] tags)
{
var metric = new MetaMetric();
foreach (var loggerInfo in Loggers)
{
if (loggerInfo.Logger is IMetricLogger metricLogger)
{
metric.Add(metricLogger.DefineMetric(name, tags));
}
}
return metric;
} So if the meta-implementation does that, do the actual implementations even need to cache them internally? Of course now there's a question whether the tags would need to be part of the key that uniquely identifies the metric, and the performance overhead of turning a name plus an arbitrary array of |
Sure i get what it will do. My question is why leak the "create it and cache it" abstraction. Why can't send any arbitrary key, if it's the first time the it's seen it then create it. |
Well, if you've got a type that's already designed to be a singleton (like an API client wrapper), it's better to define the metrics in the constructor into readonly IMetric instances and then just use them on every call. |
What if I have a metric for every time a certain http code is returned. Do I have to register say all 1000 numbers or whatever in case I send one? That is my problem with forcing preregistration. Now you might say I could tracking I made it or not and I certainly could but why should the client app do that or care. One big difference between metrics and logging is that metrics should be a lot more fire and forget... ( hence udp and arbitrary keys in statsd) |
This is what I plan todo for Orleans telemetry APIs v2 in case we don't find the proper abstractions here: Just as a context, the facet system is somehow a way to configure some of the services that are injected to other types. public abstract class MetricFacet : Attribute
{
public string Name { get; set; }
public bool IsShared { get; set; }
public bool AutoFlush { get; set; }
}
// Each type of metric would have a facet attribute that would describe how it can be configured
// Since Facets system from was designed with the purpose of offer a common standardized way to
// inject framework services, I thought it would be a good fit for telemetry/metrics
public class CounterMetricFacet : MetricFacet
{
// Should aggregate "BatchCount" events before flush (even if AutoFlush == true)
public bool Aggregate { get; set; }
public int BatchCount { get; set; } = 10;
}
public class TimeSpanMetricFacet : MetricFacet
{
// Similar to batch, but only report data when a window is expired.
public TimeSpan Window { get; set; }
}
// Basic metric interface which all metrics should implement
// Regardless of the type of the metric, they all need to flush it to the registered telemetry consumers
public interface IMetric<T>
{
Task Flush();
}
// The CounterMetric metric behaves similar to a performance counter, counting an arbitrary number.
// Based on the Facet configuration, it can aggregate a number of events and only flush once.
public interface CounterMetric<T> : IMetric<T>
{
Task Increment(Dictionary<string, object> meta);
Task Decrement(Dictionary<string, object> meta);
Task Increment(T amount, Dictionary<string, object> meta);
Task Decrement(T amount, Dictionary<string, object> meta);
}
// Another example of metric which would be used to track down some execution times
public interface ITimeSpanMetric : IMetric<TimeSpan>
{
Task BeginRecording(Dictionary<string, object> meta);
Task EndRecording(Dictionary<string, object> meta);
}
// A regular grain which receives the metrics injected as Facets
public class MyGrainWithCounter : Grain, IMyGrainWithCounter
{
private ICounterMetric<int> _myIntCounter;
private ITimeSpanMetric _myTSMetric;
public MyGrainWithCounter(
[CounterMetricFacet(Name = "MyIntCounter", IsShared = true, Aggregate = true, BatchCount = 100)] myIntCounter,
[TimeSpanMetricFacet(Name = "MyTSMetric", AutoFlush = true, Window = TimeSpan.FromMinutes(1))] myTSMetric)
{
_myIntCounter = myIntCounter;
_myTSMetric = myTSMetric;
}
public async Task DoSomethingLong()
{
await _myTSMetric.BeginRecording();
await Task.Delay(100000); //Do something really useful and that takes time...
await _myTSMetric.EndRecording();
}
public async Task AddSomethingForMe(int amount)
{
//To some business logic for the grain method and after we are done increment the counter with the value passed by
await _myIntCounter.Increment(amount);
// We need the APM tool to receive the metric data right away, so flush it!
await _myIntCounter.Flush();
}
}
//The flushing, aggregating, etc, are all configurations that would be specified at the facet attribute,
//so it can be created and injected before the grain is created. That way we would achieve granularity
//of the configuration per metric, per grain type.
//The Telemetry consumers would be almost the same as they are today.
//Depending on the metric type, if they are shared, they could leverage some system target that would be
//counting that particular metric and behaving as a backend for it, also making use of the turn-based concurrency model (single-thread guarantee per grain),
//so no metric writes would be lost. Not all the aspects on this implementation would fit a general purpose metric abstraction, but they can give a sense of freedom in terms of public developer APIs. For example, the Facets system used in Orleans could be easily replaceable by a In other hand, what we call Telemetry Consumer, is nothing more than another abstraction where logging/metric framework developers will plug in their backend implementations (i.e. AI, NewRelic, App Metric, etc.). That way we decouple metric publishing from consuming in the framework. The |
The aggregation doesn't happen on the publisher side, it is just a configuration that tell the Looking at Orleans actor implementation (aka Grain) you have the following: public class MyGrainWithCounter : Grain, IMyGrainWithCounter
{
private ICounterMetric<int> _myIntCounter;
private ITimeSpanMetric _myTSMetric;
public MyGrainWithCounter(
[CounterMetricFacet(Name = "MyIntCounter", IsShared = true, Aggregate = true, BatchCount = 100)] myIntCounter,
[TimeSpanMetricFacet(Name = "MyTSMetric", AutoFlush = true, Window = TimeSpan.FromMinutes(1))] myTSMetric)
{
_myIntCounter = myIntCounter;
_myTSMetric = myTSMetric;
}
public async Task DoSomethingLong()
{
await _myTSMetric.BeginRecording();
await Task.Delay(100000); //Do something really useful and that takes time...
await _myTSMetric.EndRecording();
}
public async Task AddSomethingForMe(int amount)
{
//To some business logic for the grain method and after we are done increment the counter with the value passed by
await _myIntCounter.Increment(amount);
// We need the APM tool to receive the metric data right away, so flush it!
await _myIntCounter.Flush();
}
} In this particular case, since That flexibility allow multiple kinds of metrics to be collected regardless if shared or not, and regardless of whether it is batched or not. So, answer that first question, the aggregation logic can happen in two places. At the object level and at the consumer level. At the object (non-shared metrics) you can aggregate data from a particular instance of the object. At the consumer, it can have many different logic implementation to aggregate and batch it.
Yeah, that was my point. If that key-value pairs are going to be used or not, it is up to the consumer implementation.
People use to think that metrics can be achieved with just numbers. I would say that if we compare with logs, they are just strings. Today there is a trend to mess log with contextual data and you have Structured Logs. As much as I don't like to rely on those structured string parsers, I think have the flexibility to declare whatever data structure you want to represent your metrics can have its use. Telemetry (both logs and metrics) can have structured data on it. If you want to create your own consumer that handlers a strongly-typed data structure that holds all the payload content you want for your telemetry, why should we block it? I know that there is a key-value on each pair, but sometimes, it would be better if we have strong typing to describe the events. If you look at AI SDK you have IMHO, I think people give Telemetry not much attention and resume it to just "PerformanceCounters" and "Plain Logs". |
Do you have an example of a string that is a metric and wouldn't otherwise be represented as an Event? All the possible consumers I've explored (Influx, Statsd, App Insights) only support numeric types for metrics. Metrics are values which are viewed almost entirely as aggregates (yes, as @Drawaes points out you may want to view the current value of the metric, but in general you aggregate). We want to support offloading that aggregation to later processing and thus want to avoid aggregating in-process, but it's still a value that is designed to be aggregated.
I'm pretty sure @benaadams will hunt me down if I make him allocate objects to write a metric ;). Metrics are often written in the hot paths of an app. Being able to emit a single |
Yeah I have only seen numbers. Anything else is a log or it's a number against a new key. IMHO ;) |
Then we should kill all services like NewRelic, Application Insights, DataDog HQ, AppDynamics, and all other pretty stablished APM tools/services as all they have object allocation for complex (not just a number) metrics. :)
Agreed. |
You don't wanna open that can of worms.. cause yeah I wouldn't have most of those in my hotpath ... ;) |
And they become your bottleneck if you try to log metrics at very high speed; the documentation for Application Insights even says:
|
Again, I think the issue here is conflation... I am thinking ... I want to measure stock ticks per second, latency through a system, or say bullets fired per second and map it to CPU use, or memory or GC collections, I want to know when a systems latency is climbing and map it in realtime (or near realtime with small amounts of aggregation) to stuff going on. If I want higher level stats, I can use methods discussed previously, say structured logging or app insights or something. |
FWIW, the last fairly large system I implemented metrics in with InfluxDB/Grafana used a combination of those approaches - some important or high-cardinality metrics went over as raw points, and high-frequency counters were mostly pre-aggregated (point volumes were an ongoing problem for quite a while). |
Hi everybody, I am excited to see such a lively thread on this. I will read it in detail and respond more thoroughly in a couple of days when I return from an ongoing industry conference. But to inform the discussion, let me say that we are currently working on a Metrics Library that we hope to release as a public Beta within the next few months. You can see the work in progress in our Applications Insights SDK Labs repo in the "metrics/master" branch under the "Metrics" directory: https://github.com/Microsoft/ApplicationInsights-SDK-Labs/tree/metrics/master/Metrics While many details are already set for the initial version, I am looking forward to a discussion to help us iterate and improve. |
Guys, again, I'm not saying you should allocate those object for all metrics. Also, I'm not saying you should collect metrics right to AI (or whatever APM tools) from your hot path. You can collect simple metrics at the publisher side, which should be fast, but at the consumer side (which is where you create those objects) you can afford to be slow and capture more context, make aggregations, batching, etc. For example, in a Online Travel Agency/Airline system, speed of the return of a trip query is all that matters. However, you still need to know how much time was spent in a query on each source (Amadeus, Sabre, LiveTrip, etc.)? How many of those queries were converted to a booking? what is the average ticket? All those are business metrics. If in a given point in time, the booking rate drops, and the time spent in query rise, you will need to correlate it and having just a number saying the number of purchase, is not rich enough to give me that information (even worse if you rely on simple log strings!), so I'll only know that "we sell less flights" but will never know "why?". My point is, APM is way more than simple counters. If it has a performance cost to use it, then it is up to the person designing the system to take into consideration whether or not this information is relevant to them. IMHO, it is a tradeoff, like anything on CS. |
BTW, my feeling here is that we are conceptually changing subjects. I guess people here want a new Windows Performance Counters v2. If that is the case, I'm out of the discussion as I'm talking about metrics in the context of APM/Telemetry. |
The primary requirement is to provide a metrics abstraction that the ASP.NET framework can consume and report to.
This sentence is so sad - it makes me want to cry.
I don't understand why the ASP.NET team thinks it lives in a bubble and needs to reinvent the wheel for everything. The end result of this will be the same as with EventSource+ILogger, we as application developers will have to pay the price of understanding and combining both to get a good overall output.
Here it will be EventCounter (or PerfCounter or whatever corefx will end up with) vs this API.
Now you might say just stick to one thing, but that's not even possible for server-side applications as the Azure team will probably not use this (they are not using Microsoft.Extensions).
Please move all metrics-related efforts to corefx and make sure the whole stack can and will use it! Everything else will be bad for the whole .NET ecosystem.
… Am 21.09.2017 um 00:21 schrieb Andrew Stanton-Nurse ***@***.***>:
The feedback on this thread has been really great so far. I want to echo what @Eilon said above though. This is like a first prototype concept. We have a PR with some code, but we all know that code changes and code gets deleted :). Things aren't set in stone and we want to iterate on this design a lot (and sometimes iterating means deleting and starting again ;)).
The primary requirement is to provide a metrics abstraction that the ASP.NET framework can consume and report to. We also want to provide a metrics system that co-operates with the DI system in order to serve the app hosting models used by the Microsoft.Extensions.* systems. EventCounters being tied to EventSources, which are a process-global concept, don't fit well into that as a framework-level (specifically ASP.NET framework) abstraction. So @vancem is right here in that our requirements don't match exactly with the core EventCounter pattern. Having said all that, changing to a design that doesn't mandate a connection with ILogger is certainly valid feedback and I think we'll iterate a bit on that.
That is when telemetry points are hit, counts are updated, and then from time to time (sampling), you send out the aggregated number. - @vancem
This is unfortunately not always the case and the source of a mismatch between EventCounters and the APM tools we're seeing in the wild. Systems like statsd and InfluxDB tend to encourage transmitting unaggregated metric data and allow the consumer to perform the aggregations later, filtering and grouping on specific properties as necessary (which gets to @Drawaes' point on metrics with metadata, which is something we're going to look at integrating into the design). Other systems do expect in-process aggregation, but since it's not consistent how users do this, we need an abstraction that allows for both to happen so that frameworks and libraries can use that abstraction and the decision about where to aggregate can remain with the application developer. This is even more important with the idea that different metrics require different aggregations (various percentiles are important, etc. it's not just a constant set of count, mean, max, min, etc.). We want to build a metrics abstraction that does not impose our way of monitoring, but rather enables others.
In particular the ILogger pipe is designed specifically for use in ASP.NET's dependency injection system. - @vancem
This is precisely the goal of this abstraction. As a producer abstraction, the goal here is to build something that enables ASP.NET (and other Microsoft.Extensions-based) frameworks and applicatons to emit metrics in a way that matches with the eventing system they already use. The use of ILogger reflects this. I've definitely heard the feedback that it feels weird to use ILogger as the place to hang metrics production, and we'll do a little more thinking about this and keep that discussion open. The benefit of having it there is that most components that would want to emit metrics already have an ILogger. And in fact, most metrics are tracking things that are already written as log events. Having said that, I see the argument that while it's useful, it could also be restrictive to tie this to Loggers. Perhaps a system where ILogger can be used to emit metrics, but other abstractions exist as well would be a better design, we'll iterate a bit on that.
This means any metric system that is built on top of ILogger will not have instrumentation points deep in the framework. - @vancem
As we've learned with other diagnostics integrations, this is really not a goal we can reasonably achieve. The Microsoft.Extensions frameworks are very anti-globals as an explicit design goal. The underlying .NET runtimes/framework stuff can't rely on things like DI and thus has more global abstractions (and rightly so). This is why our goal is not to produce an abstraction that the underlying .NET runtimes/framework can use. EventCounter already exists for that. However, EventCounter is not something that fits into our layering because of how global it is, so we need something that abstracts things into a model that does fit.
My question is why leak the "create it and cache it" abstraction - @Drawaes
This is a performance-guided pattern and you may be correct that it's a bit overkill. The idea here is that you want to create and allocate the necessary buffers and structures for the metric in advance so that recording values (which will likely occur in hot paths) can be done very cheaply. Perhaps this is leaking too much abstraction, we can look at iterating a bit on that. I'm a little nervous about requiring metrics providers to have a dictionary lookup on every call to RecordValue though. Perhaps requiring providers that aren't just fire-and-forget (like statsd) to shuffle that work off to background threads is sufficient though.
As to double-calling Define, @markrendle has got it right, we make sure two calls to Define with the same name get the same object. So it would be technically safe and correct to always use .DefineMetric("foo").RecordValue(42) (and you could make an extension method to do that for you).
This is what I plan todo for Orleans telemetry APIs v2 in case we don't find the proper abstractions here - @galvesribeiro
It looks like in your suggestion aggregation and consumption parameters being specified at the place you write the metrics in your pattern. From what I've seen, there's enough variation in how users want to aggregate that it feels cleaner to push the aggregation logic out to the consumption end. Is that not something you've seen in your scenarios? That would be valuable feedback for us :)
Property bags is definitively something that you will want to have support on all methods tracking metrics. - @galvesribeiro
Feedback received ;). I had noticed the same in my investigations into various TSDBs and metrics forwarders. Some systems don't support recording those arbitrary parameters (EventCounters, App Insights discourages it by expecting pre-aggregated data) but we also realized that a pattern where the producer can emit key-value pairs and consumers can decide if they can support them or not.
Why don't make IMetric where T can be int, float, decimal, long? I think using double only is way restrictive. - @galvesribeiro
I'd hesitate to support IMetric<T> as I think many providers would have a lot of trouble with aggregating say a string ;). Providing the ability to use any numeric type and propagate that type down to the provider (so the provider can apply any necessary optimizations) does seem reasonable though. Just have to think of the way to do it (since you can't do where T: int | double | decimal | ... ... yet?)
We also need to dive a bit in to the idea of Metric Types. A number of existing systems provide them, mostly those descended from the Java Metrics library. Things like "Gauges", "Counters", "Meters", etc. What I'm less clear on is the value of specifying these types up-front at the point of production vs. having the consumer control the aggregations they want to perform. For example, with a metric like "Requests per Second", I may want to be able to view both the latest value and the aggregations, as well as histograms. Requiring the producer to define the type up-front seems to limit that ability.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@aliostad Do you know the new In terms of requirements I would also like to know what the actual usages will be in the asp.net core framework. |
No I have not seen, @cwe1ss thanks for the ping. It is great to see that .NET has adopted OpenTracing. |
@aliostad <https://github.com/aliostad> Do you know the new Activity type
in System.Diagnostics.DiagnosticSource? It is based on OpenTracing's Span
construct (it lacks most of the other constructs though)
For those watching from home, here's the issue that brought in the Activity
api (which indeed was cited as inspired by OT)
https://github.com/dotnet/corefx/issues/15023
|
This proposal does not, no.
It only has so partially. It e.g doesn't have support for inject/extract, it doesn't have a 'tracer' component (The closest is its integration with DiagnosticListener). |
@adriancole yes, that is great to see we finally started to have good things. |
Agree with conflicting points here, and it's been an awesome insight reading through this. For a top-notch, world class performing web framework, agree on the points raised around no allocations and not forcing aggregations in-process. I would like to mention that for some applications a simple metrics reporting approach is ideal, and having aggregated metrics available in-process opens up some interesting use cases. Things like statsd and telegraf are great, however if the cost of reporting and aggregating in-process is pretty much negligible for particular applications I don't see why one would bother with maintaining and configuring an aggregator, especially when the cost of this is much less than typical things an application has to do to be useful, HTTP/DB calls for example. Similarly re logging for example, some prefer to batch and push directly to elasticsearch, others prefer to use logstash or beats.
That ^ makes a lot of sense IMHO. l don't however see why I would need an |
@alhardy I think both your scenarios are doable with the proposal. In fact I think basically every ones are. If there is a simple lightweight metrics collection, then its up to the "collector" how the metric behaves. If you need "low touch" you can have a collector that just forwards to statsd/telegraf etc. If you don't want to go to those lengths there is scope for the community to make a solid inproc collector that does simple and fancy aggregation. Finally if the metrics are available, then in the richer tracing/telemetry scenario you can use existing methods with say the Activity type, and just use the metrics to enrich those Activities with extra information. I don't think the addition of a common way (if designed correctly and available for low down framework metrics, or at least those adaptable to it) precludes anyones wishes so far stated. As for how the "internal framework" things can be reported easily. Maybe a list of all the instrumentation that is currently available could be made with community/cross MS team help. Then both internally and from the community we could attack making "adapters" for these events that you can wire up to the metrics concept. I could imagine something like
Which could attach a low alloc EventSource => IMetric adapater |
This has been a really active thread and we've gotten a lot of feedback. I think the main thing we've discovered is that we just aren't really ready to dive deep into a rich abstraction for Metrics in 2.1. We still want to provide metrics within the ASP.NET Platform though (replacing the old ASP.NET 4.x Performance Counters) so we're going to be closing this item for now and work on providing our own platform metrics via the EventCounter type in .NET Standard. Metrics emitted by that type are available in-process via EventListener and we can build components off that to forward metrics to data stores. When it comes time to revisit a broader metrics abstraction, we'll revisit all this feedback. |
Awesome! Thank you for listening to our feedback!! |
Hey @anurse, What's the current state of getting metrics about .NET Core internals with Event Counters? Thanks! |
There are no event counters in .NET Core as yet. This is something we plan to tackle more broadly in 3.0.
We're doing work in 2.2 (dotnet/coreclr#18649) to expose the runtime events via an EventListener so that you can write code in the application to forward to something like Prometheus or influxdb (or anything else).
Today there are a couple of things you might be able to use but the fidelity of information may not be the greastest ATM. @brianrob Can give more details here. |
With dotnet/coreclr#18649, we now have some very basic support for consuming runtime events via EventListener. You can find an example of how to use it here: https://github.com/dotnet/coreclr/blob/master/tests/src/tracing/runtimeeventsource/RuntimeEventSourceTest.cs. Note that at this time, the fidelity is definitely not perfect and there is still more work to be done before I would consider the feature ready for use. That said, feel free to give it a try and provide feedback. It should work on all supported platforms. |
Hey @davidfowl @brianrob, |
Overview
We want to enable applications to emit arbitrary named metrics and have them captured by telemetry/logging systems (App Insights, Elasticsearch/ELK, etc.) as well as other metrics systems (Statsd, InfluxDB, etc.). An abstraction is desired in order to allow applications to emit metrics without being coupled to the destination of the metrics, particularly with features like App Insights auto-lightup where the app doesn't directly depend upon App Insights.
Proposal
We will add support for metrics reporting to
ILogger
. It will be done via a new companion interfaceIMetricsLogger
, which loggers may optionally implement. Applications can emit metrics through this interface (see API below), and logger providers which support metrics can receive and process those metrics. Metrics are(name, value)
tuples, wherevalue
is always floating point number. Metrics are expected to be viewed in aggregate, hence it doesn't make sense to use a generic data type likeobject
for the metric, so metric values are alwaysdouble
s. Supporting arbitrary types of metric values is a non-goal. Reporting metrics should be as efficient as possible, especially in the case where no logger provider is registered to listen for them, or they are filtered out.Why add to
ILogger
?Initially, I considered developing a new abstraction for Metrics. However, I encountered a few concerns:
ILogger<T>
,IMetricReporter<T>
, etc.)The fact that 1 and 2 above already exist in the Logging pipeline, and many metric sinks are also logging sinks, led to the idea of integrating metrics into Logging rather than creating a completely new abstraction.
The main reason why we would not want to add this to
ILogger
would be a riskThe API
IMetricLogger
Optional companion interface for
ILogger
implementations to indicate their support for metrics. TheILogger
instances returned byILoggerProvider
are expected to implement this interface if and only if they wish to receive metrics.IMetric
Interface that allows metric data to be reported. Metric values
LoggerMetricsExtensions
Provides an extension method on
ILogger
to enable consumers to emit metrics, even if the underlying logger provider doesn't support it (of course, the metrics will be ignored in that case).MetricValueExtensions
Extension methods for
IMetric
to support other common metric value types (as mentioned above, the underlying type is stilldouble
, so the value must be representable as adouble
).Suggestion: We could have a
.Time()
API that returns anIDisposable
which uses a Stopwatch to do the timing for you as well. That would be easy to add later though.Logger
updatesThe aggregate
Logger
class we return fromLoggerFactory.CreateLogger
would be updated to supportIMetricsLogger
, even if none of theILogger
s it aggregates support it. It would only send metrics to the loggers that are known to support the interface.Define
/Record
patternIn order to keep the performance cost as low as possible for recording the actual metric value, the API uses a
Define
/Record
pattern. Consumers should expect to callDefineMetric("foo")
once for a particular value of "foo" and store the result as globally as possible (in singleton types, etc.). It isDefineMetric
that does the bulk of the work to set up the metric recording. After callingDefineMetric
, theRecordValue
call on theIMetric
interface can choose how to store the value based on the providers needs. For example, in a provider that plans to send pre-aggregated metrics, theRecordValue
could simply update rolling aggregate values and then drop the actual recording (thus having constant storage requirements for each metric). If a provider needs to report individual metrics, it can use dynamically-allocated buffers or ring buffers, depending upon the configuration and the needs of the provider. As an example of anIMetric
implementation based on pre-aggregating data, see https://gist.github.com/anurse/03c6746204317bd3616c372b4df9dbbaFiltering Metrics
Metrics should be filterable, and they should participate in the existing Logger filtering pipeline. My current proposal is that Metrics are treated likeLogLevel.Critical
messages, in that if the category is enabled at all, the metrics are written. We could consider capturing aLogLevel
in.DefineMetric
and using that to provide different "levels" of metrics. Since the existing filters only let you filter by Provider, Category, and Level, it is very difficult to filter out specific metrics by name (as that would require a new Filter structure). This may makeLogLevel
support for metrics more important.@pakrym and I talked and came up with a specific suggestion
Metrics can be filtered at the provider and category name, but there is no concept of a
LogLevel
for metrics. However, filters can disable all metrics for a particular provider/category combination. To do this, we'll add anAllowMetrics
boolean toLoggerFilterRule
which defaults totrue
. When writing a metric, we'll check the provider and category name against the filter options as well as checking this boolean.Scopes and Property Bags
FEEDBACK REQUESTED
My initial thought is that Metrics exist outside of scopes, as they are generally aggregated and Scopes imply per-event data. Providers could, in theory, give their
IMetric
instances access to read active scope data, and attach them to metrics if they choose (for example, systems that report unaggregated metrics may want to). For similar reasons, property bags (i.e. Log Values/Structured Logging, etc.) don't really make sense for metrics in general as they are usually aggregated. Having said that, many providers do support arbitrary properties (InfluxDB, App Insights), so some level of support may be useful. We could easily add an optional parameter to.RecordValue
to allow adding arbitrary properties to each metric, and providers could choose to disregard it if it doesn't make sense for that provider.Impact on existing Providers
There is minimal impact on existing Providers. Because the interface provides an opt-in model for a provider to receive metrics, existing providers are unaffected. I'd imagine providers like Serilog would not participate in this system as they don't have an infrastructure for metrics (/cc @nblumhardt , I may be wrong!). Providers like App Insights may choose to either add metrics support to their existing providers or add new providers that only handle metrics (and ignore log messages) that are co-registered.
The text was updated successfully, but these errors were encountered: