-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new (hopefully easier) telemetry mechanism #68261
Conversation
@@ -122,6 +123,7 @@ static ProjectCodeRefactoringProvider.ExtensionInfo GetExtensionInfo(ExportCodeR | |||
Func<string, IDisposable?> addOperationScope, | |||
CancellationToken cancellationToken) | |||
{ | |||
using (TelemetryLogging.LogBlockTimeAggregated(FunctionId.CodeRefactoring_Summary, "Pri" + (int)priority)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I like the fact that you have added lot of different fine grained aggregated log blocks for different parameter combinations (such as specific priority, fix vs refactoring, specific provider, etc.). The tricky part for someone new looking at this telemetry might be to figure out which ones to dig into for investigating their specific scenario. I think eventually adding a document or even a brief summary somewhere about the different type of events logged in each space (diagnostics, quick actions, etc.) with a small description will be helpful. Basically, documentation from investigation perspective, not the actual implementation perspective. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree, but it seems to me that such documentation would be very easy to get out of date. I think the dashboards are a nice piece of "documentation" from this perspective as they give the actual rundowns of the telemetry events that exist and are being fired.
var (allDiagnostics, upToDate) = await _diagnosticService.TryGetDiagnosticsForSpanAsync( | ||
document, range, GetShouldIncludeDiagnosticPredicate(document, priorityProvider), | ||
includeSuppressedDiagnostics: false, priorityProvider, DiagnosticKind.All, isExplicit: false, cancellationToken).ConfigureAwait(false); | ||
var telemetryContext = "Pri" + (int)priorityProvider.Priority + "." + nameof(GetMostSevereFixAsync); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you added name of the method here in the telemetry context, but not at other places. I am fine with either approach of appending the containing method name or not, but we probably want to be consistent? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely agree that consistency is good, but from my viewpoint the most important point is that the telemetry data when viewed is informative and can be reasoned about. I tried to name the events as such in this PR, but I don't have the context which you have in this area so I'm definitely willing to change any of the event names if you think different names would be more informative. Not sure if you have looked at the dashboard with this information, as that might give a more holistic view of how the events relate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about interpolated strings and interpolated string handlers.
@@ -216,6 +217,9 @@ public async Task<bool> TryGetAsync(ArrayBuilder<DiagnosticData> list, Cancellat | |||
using var _2 = ArrayBuilder<AnalyzerWithState>.GetInstance(out var semanticSpanBasedAnalyzers); | |||
using var _3 = ArrayBuilder<AnalyzerWithState>.GetInstance(out var semanticDocumentBasedAnalyzers); | |||
|
|||
var telemetryContext = "Pri" + (int)_priorityProvider.Priority; | |||
using var _4 = TelemetryLogging.LogBlockTimeAggregated(FunctionId.RequestDiagnostics_Summary, telemetryContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if this will lead to duplication with the telemetry logged in code fix service around the single call to diagnosticService.GetDiagnosticsForSpanAsync
. The log block there is dedicated for lightbulb Ctrl + Dot path, while the block here is for all the following paths: lightbulb Ctrl + Dot, margin lightbulb background analysis, tagger, all LSP pull diagnostics calls. If we want to log the telemetry here, you definitely need to add more parameterization to the context so we have different blocks for these code paths. The parameters to include would be _range
, _incrementalAnalysis
, _diagnosticKind
(maybe we want to log separate aggregated blocks per-diagnostic kind and also one accumulated block which is agnostic of the diagnostic kind). #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I am not sure of the performance penalty we pay by logging multiple aggregated blocks here. If it is expensive, then a single summary block per each code path would be fine, but we definitely need more context than just the priority here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd like to leave this to a follow-up PR where we can discuss what you would like to see here. Note that every piece of data that we use to build up the event name multiplies the potential number of event names, so including something like range in the name isn't desirable.
@@ -61,6 +62,37 @@ private PerformanceQueue GetQueue(bool forSpanAnalysis) | |||
|
|||
public void AddSnapshot(IEnumerable<AnalyzerPerformanceInfo> snapshot, int unitCount, bool forSpanAnalysis) | |||
{ | |||
foreach (var perfInfo in snapshot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume the intent for this PR to add aggregate logging for these snapshots, and a later PR will then delete our existing code that does custom aggregation of these snapshots and logs them in background? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our existing logging support in this area was a bit too complicated for me to tackle to start with. I feel like the old support can be mostly removed, but it's something I would like to discuss further with you outside this PR.
src/Workspaces/Remote/ServiceHub/Services/DiagnosticAnalyzer/PerformanceTrackerService.cs
Show resolved
Hide resolved
{ | ||
private TelemetryLogger? _telemetryLogger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm: The fact that the new telemetry logger is only hooked up in the remote workspace and not host workspace is not a limitation of the logger but just the need for this PR, correct? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on this question, can you clarify? TelemetryLogger.Create is called both from RemoteWorkspaceTelemetryService and from VisualStudioWorkspaceTelemetryService.
src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Log/FunctionId.cs
Outdated
Show resolved
Hide resolved
src/Workspaces/Remote/ServiceHub/Services/ProcessTelemetry/RemoteProcessTelemetryService.cs
Outdated
Show resolved
Hide resolved
src/VisualStudio/Core/Def/Telemetry/AggregatingTelemetryLogManager.cs
Outdated
Show resolved
Hide resolved
_isDisposed = false; | ||
_lock = new object(); | ||
|
||
_ = PostCollectedTelemetryAsync(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❗ Asynchronous operations must be tracked either by the life of the calling method or through asynchronous operation listeners. Fire-and-forget methods always fall into the latter category. However, since this is a long-lived operation that won't terminate as expected by our tracking mechanism, the preferred strategy is to use an AsyncBatchingWorkQueue
, which will report that it's complete whenever the queue is empty. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know about this, thanks! Switched to use this in commit 2.
src/VisualStudio/Core/Def/Telemetry/AggregatingTelemetryLogManager.cs
Outdated
Show resolved
Hide resolved
src/EditorFeatures/Core.Wpf/Suggestions/SuggestedActionsSource_Async.cs
Outdated
Show resolved
Hide resolved
src/EditorFeatures/Core.Wpf/Suggestions/SuggestedActionsSource_Async.cs
Outdated
Show resolved
Hide resolved
src/Features/LanguageServer/Protocol/Features/CodeFixes/CodeFixService.cs
Outdated
Show resolved
Hide resolved
@@ -112,6 +113,8 @@ private async Task InvokeAsync() | |||
{ | |||
try | |||
{ | |||
using var _ = TelemetryLogging.LogBlockTimeAggregated(FunctionId.SuggestedAction_Application_Summary, "Total"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would we know what suggestion action this adds up to? or are we just wanting to get top level P numbers of invocation in general? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aggregated collection won't tell us anything about what is being tracked other than a histogram of times and what is captured in the event name. So, the intent here is to capture a histogram of all "Total" costs of invoking a suggested action. If we find out that we are having perf issues in this area, then I could see us adding a TelemetryLogging.LogBlockTime call with a threshold time for firing individual events.
@@ -110,17 +111,22 @@ private partial class SuggestedActionsSource : IAsyncSuggestedActionsSource | |||
// then pass it continuously from one priority group to the next. | |||
var lowPriorityAnalyzers = new ConcurrentSet<DiagnosticAnalyzer>(); | |||
|
|||
using var _2 = TelemetryLogging.LogBlockTimeAggregated(FunctionId.SuggestedAction_Summary, "Total"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the diff between this and SuggestedAction_Application_Summary? Is it so we can measure what time is spent in the source, prior to actually doing computation at each pri level? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SuggestedAction_Summary is intended to measure the time to calculate the actions at each priority, whereas SuggestedAction_Application_Summary is intended to measure the time to apply the changes for an invoked suggested action.
Keeping at 30 minutes and removed disposal code. |
hrm, I seem to have done something wrong rebasing against main (I did so as this branch was based off a main a couple weeks old) |
Fixed rebase via a force push. Not sure what went wrong, but that problem is solved. The tests were failing because some test code requires AsyncBatchingWorkQueue to be completely empty to progress. I now no longer automatically queue a new item once I've processed the old one instead waiting to do so until the log manager's GetLog method is called. |
/// <summary> | ||
/// Indicates whether <see cref="_postTelemetryQueue"/> has work already added to it. | ||
/// </summary> | ||
private int _workAdded = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why this is needed. you can always just add more work to the queue. you're using the no-type-arg version, so it will be a no-op if htere's already an item in the queue.
// The queue doesn't have any work pending. Add an item so PostCollectedTelemetryAsync | ||
// will get fired after the collection period. | ||
_postTelemetryQueue.AddWork(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem necessary.
PostCollectedTelemetry(); | ||
|
||
// Reset to indicate the queue doesn't have any work pending. | ||
_workAdded = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also not necessary. this is how the queue works to begin with. in the no-type-arg case you can think of it just as a beacon that pulses after the time limit has happened as long as at least one work item was added to it.
if (elapsed >= _minThresholdMs) | ||
{ | ||
const string Name = nameof(Name); | ||
const string Value = nameof(Value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second time i've seen these constants. Place in common location?
return; | ||
|
||
const string Name = nameof(Name); | ||
const string Value = nameof(Value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another copy :)
namespace Microsoft.CodeAnalysis.Telemetry | ||
{ | ||
[InterpolatedStringHandler] | ||
internal readonly struct TelemetryLoggingInterpolatedStringHandler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice.
PerformAnalysis_Summary = 721, | ||
RequestDiagnostics_Delay = 730, | ||
RequestDiagnostics_Summary = 731, | ||
SuggestedAction_Delay = 740, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank lines betweent he sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor suggestions/requests. None of them really blocking. Ok for this to go in once you're satisfied the suggestions are handled to your liking.
/azp run |
Azure Pipelines successfully started running 4 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 4 pipeline(s). |
/azp run |
Pull request contains merge conflicts. |
/azp run |
Azure Pipelines successfully started running 4 pipeline(s). |
The intent here is to provide a telemetry mechanism that is easy to use, and hopefully provides enough functionality that most telemetry users can use it directly. The currently most common mechanism how Roslyn uses telemetry (that I've found) is through the Logger.Log call. This call aggregates multiple loggers through the ILogger interface and funnels the logging call to each implementationi (TelemetryLogger being one). I wanted to separate the telemetry calls out from this interface as I needed to add a minimum time threshold for some of the telemetry logging. Separating these calls out seems to make sense as I wanted to make it explicit that these calls were logging telemetry. Additionally, the existing telemetry calls through ILogger don't support aggregating values. The new implementation supports this through the VSTelemetry histogram support (based on OTel). There is yet another telemetry mechanism in roslyn that does support histogram/count calls. The intent here is to switch over callers of that implementation to this new one (I can't yet though as I didn't add support for the Count part of the API). The aggregated support will send histogram telemetry events every 30 minutes or upon process exit. For targeted notification support, non-aggregated telemetry calls are added upon a minimum time threshold being met. Telemetry dashboards consuming these telemetry events here [https://dataexplorer.azure.com/dashboards/31bfd6e9-daa4-4273-acdb-e02cb156c289] (work in progress)
The intent here is to provide a telemetry mechanism that is easy to use, and hopefully provides enough functionality that most telemetry users can use it directly.
The currently most common mechanism how Roslyn uses telemetry (that I've found) is through the Logger.Log call. This call aggregates multiple loggers through the ILogger interface and funnels the logging call to each implementationi (TelemetryLogger being one). I wanted to separate the telemetry calls out from this interface as I needed to add a minimum time threshold for some of the telemetry logging. Separating these calls out seems to make sense as I wanted to make it explicit that these calls were logging telemetry.
Additionally, the existing telemetry calls through ILogger don't support aggregating values. The new implementation supports this through the VSTelemetry histogram support (based on OTel). There is yet another telemetry mechanism in roslyn that does support histogram/count calls. The intent here is to switch over callers of that implementation to this new one (I can't yet though as I didn't add support for the Count part of the API).
The aggregated support will send histogram telemetry events every 30 minutes or upon process exit. For targeted notification support, non-aggregated telemetry calls are added upon a minimum time threshold being met.
Telemetry dashboards consuming these telemetry events here (work in progress)