-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit OpenTelemetry metrics #2211
Comments
Forget my comment. I just noticed the difference. |
This comment was marked as outdated.
This comment was marked as outdated.
Thanks @gfoidl, corrected the link. |
SqlClient actually already emits some metrics, but their reporting is a little spotty. On .NET Framework in Windows, every process which uses the library has a set of performance counters (listed here.) On .NET Core and .NET Standard 2.1 and above, Microsoft.Data.SqlClient.EventSource has a number of event counters (listed here.) Although the counter names are slightly different, their meanings are the same:
The SoftConnectsPerSecond, SoftDisconnectsPerSecond, NumberOfActiveConnections and NumberOfFreeConnections performance counters are only created if the ConnectionPoolPerformanceCounterDetail trace switch is set to Verbose. Having a consistent interface to increment/decrement the metrics would be helpful for code merging, but the real value would probably come from making System.Diagnostics.Metrics the primary metrics source. I think the right thing to do would be to also add an AppContext switch to control the legacy metrics sources - but I'm not sure whether it'd be better to have 6.0 disable the legacy metrics sources by default and drop them in 7.0, or to have 6.0 implement the new metrics source, 7.0 disable the legacy ones and 8.0 to drop these legacy ones. For the current metrics collection, perhaps OpenTelemetry could listen to these event counters or performance counters? |
@edwardneal there's generally no specific need to drop the old metrics or introduce an app context switch for them; as long as they're not listened to, their perf overhead is negligible. So this really is about emitting OpenTelemetry metrics via System.Diagnostics.Metrics. |
That makes sense, thanks. There's only one mandatory counter ( Besides the mandatory, other standard metrics are:
Outside the standard metrics, the rest of the existing counters should also be mapped. I'd add these as:
Finally, there are some interesting fields in SqlConnection.RetrieveStatistics():
I started looking to see what other metrics might be available or tracked within the library for exposure, and this response ballooned. If the metrics in the OTel specification can be checked for correctness, there's at least a place to start. |
The definition in the specs seems pretty clear and unambiguous to me: "The maximum number of open connections allowed". Max/min connections is a really standard thing across DB connection pools, I certainly implemented it as MaxPoolSize in the Npgsql driver. In general, metrics on commands are currently missing in the specs. I went ahead and implemented some in Npgsql (code) - if implementing something similar, I'd call these out as experimental and subject to change, since I'm assuming some command-related metrics will make it in to the final specs etc. |
I was thinking of a situation where a connection could have been allocated by the client, added to the connection pool, (to meet MinPoolSize's constraint) but not yet opened, because it's either not yet been used or it's still in the process of opening. In that situation, it would be neither idle nor used, but its existence in the pool would mean that MaxPoolSize wasn't really the maximum number of open connections, because this closed (or still-opening) connection would reduce the number of slots available for completely opened connections. From re-checking the logic which creates a connection, it looks like a background thread synchronously opens the connection and adds it to the connection pool pre-opened (rather than being created closed and opened dynamically or asynchronously) so my confusion's cleared now: MaxPoolSize is the correct approach. If that creation methodology changes in the future, I think the ambiguity would return though. I agreed on the point around command-related metrics. It'd be useful to be able to assign some kind of tag (or dictionary of tags) to a SqlCommand and aggregate at that tag level, but that's probably a better fit for DbCommand. I've prefixed the existing metrics with dotnet because it seems to fit the pattern here but this'll no doubt need to change when the specs are finalised. |
I'm not aware of such a state - I'm also not sure what it would mean for a connection to be "added but not idle". If a physical connection has been opened (which it must have been) and is in the pool in order to meet MinPoolSize, then it seems to me that it's just idle. Also, if there were such a special opened-for-MinPoolSize state, and then it's rented out by the application and then returned, does it then become just regular-idle? That would seem to be very weird, since the pool is exactly in the same state as it originally was. In any case, I don't think any legitimate usage scenario out there would care (or want to deal with the added complexity) of an additional 3rd state... The important things to track with metrics are how many total physical connections the application is using, and to have a breakdown of how many of those are actually in-use vs. not in-use (to track application efficiency with regards to connection management etc.). |
No, I fully agree - a background thread synchronously opens the connections, pushing them onto the pool. By the time the pool knows about them, the connections are in exactly the state you're describing. For what it's worth though, I was thinking about two scenarios:
Neither of these scenarios are supported by the pool at the moment, so the point's academic. I can imagine situations where someone might want them though, so it's worth noting that the metrics might need some adjustment if that's implemented in the future. I think we could match most of the command metrics in Npgsql in advance of the specs. By way of comparison, I think the full list of metrics should be:
Most of these are pretty generic across different implementations of DbCommand, so might be a useful model for other providers to follow. There's nothing stopping them from being implemented in SqlCommand and being moved into core later. They might also be informative to the rest of the specification's development. A few points remain:
If you and someone from the SqlClient team are happy with the approach, I can start work on the basic infrastructure, implement the first mandatory metric and then start trickling the rest of the metrics in piecemeal. |
FWIW I'd prefix the non-standard metrics with Other than that, I'd recommend concentrating on the basic support as a first step, and then iterate from there, rather than thinking too much up-front about e.g. a new PoolName in the connection string or command groupings. There's definitely a lot that's possible with OpenTelemetry instrumentation, but it's probably a good idea to design the advanced stuff separately and as they become relevant etc. |
OpenTelemetry has become the de-facto standard for distributed, cross-platform tracing; it is being rapidly adopted both inside the .NET ecosystem and outside. OTel defines semantic conventions for how a database client emits metrics, which are aggregated, numeric counters about the state and performance of the driver (e.g. number of commands executed, idle/busy connections in the pool, etc.). these specifications are currently in experimental state, but stabilization is likely start happening soon.
Note that unlike with tracing, AFAIK there's no way to get data like this out of SqlClient at the moment. This sort of data can be invaluable in diagnosing performance problems (e.g. by inspecting real-time connection pool state) and tracking what's going on.
The text was updated successfully, but these errors were encountered: