Add new workspace event that gives handlers the opportunity to be processed immediately #76932

ToddGrun · 2025-01-26T16:13:35Z

This is in response to allocations seen when the CA process attempts to operate on a sourcetext version that has not been synchronized over. In those cases, the OOP process requests a full deserialization of the source text causing full buffer allocations in both VS and the CA process.

There are several points of asynchronicity in the current system around sending over buffer changes, reducing any of which would reduce the likelihood of needing this full serialization.

This PR removes one of those points of asynchronicity, specifically in the workspace eventing layer. Previously, all notiications were done on a delayed basis, by wrapping each notification in the WorkspaceSpace.ScheduleTask call from the RaiseWorkspaceChangedEventAsync method. This method allows callers to indicate they need to be called immediately upon the workspace change. As noted by a doc comment in the PR, these handlers should be very fast.

The numbers from speedometer came back looking pretty good (below). Another point of asynchronicity could be investigated around the usage of the _textChangeQueue ABWQ, but it might not be worth it as it looks like this change removes the majority of the serialization during typing. Other alternatives within the text syncing system such as allowing OOP to request a text change instead of the full buffer contents are also possible, but previous investigations into that ended up complicated and incomplete.

*** before allocations during typing in VS process ***

*** commit 3 allocations during typing in VS process ***

*** commit 5 allocations during typing in VS process ***
No allocations under SerializableSourceText.Serialize during typing period in profile

*** before allocations during typing in CA process ***

*** commit 3 allocations during typing in CA process ***

*** commit 5 allocations during typing in CA process ***
No allocations under SerializableSourceText.Deserialize during typing period in profile

…e processed immediately This is in response to allocations seen when the OOP process attempts to operate on a sourcetext version that has not been synchronized over. In those cases, the OOP process requests a full deserialization of the source text causing full buffer allocations in both VS and the CA process. There are several points of asynchronicity in the current system around sending over buffer changes, reducing any of which would reduce the likelihood of needing this full serialization. This PR removes one of those points of asynchronicity, specifically in the workspace eventing layer. Previously, all notiications were done on a delayed basis, by wrapping each notification in the WorkspaceSpace.ScheduleTask call from the RaiseWorkspaceChangedEventAsync method. This method allows callers to indicate they need to be called immediately upon the workspace change. As noted by a doc comment in the PR, these handlers should be very fast. Going to mark this as draft and get speedometer numbers off this to see if this helps enough, or if instead the other points of asynchronicity should be investigated (specifically, the usage of the _textChangeQueue ABWQ). There are other alternatives within the text syncing system such as allowing OOP to request a text change instead of the full buffer contents, but previous investigations into that ended up complicated and incomplete.

ToddGrun · 2025-01-26T16:15:12Z

PR validation insertion: https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/605941

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Log/FunctionId.cs

ToddGrun · 2025-01-27T03:36:19Z

Moving out of draft mode as the numbers look pretty good. Still some serialization happening during typing, but it's a lot less than before and this change was pretty trivial. If code reviewers think adding this to the workspace level is not desirable, can look into other approaches.

…ries

…r reduce the amount of serialization and deserialization

ToddGrun · 2025-01-27T14:42:11Z

commit 5 PR validation insertion: https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/606025

…ositions to allow composition to include an IThreadingContext

…ecksumUpdater.DispatchSynchronizeTextChanges and RemoteAssetSynchronizationService.SynchronizeTextChangesAsync

src/EditorFeatures/Core/Remote/SolutionChecksumUpdater.cs

...kspaces/Remote/ServiceHub/Services/AssetSynchronization/RemoteAssetSynchronizationService.cs

CyrusNajmabadi · 2025-01-28T01:31:25Z

src/EditorFeatures/Core/Remote/SolutionChecksumUpdater.cs

-        using var _ = ArrayBuilder<(DocumentId id, Checksum textChecksum, ImmutableArray<TextChange> changes, Checksum newTextChecksum)>.GetInstance(out var builder);
-
-        foreach (var (oldDocument, newDocument) in values)
+        _ = _threadingContext.JoinableTaskFactory.RunAsync(async () =>


we should def doc the design here. make it clear why we've written it this way.

note: you can also use IAsyncToken here to track the work, allowing you to properly be able to unit test as well.

Is the intent here that this is fire-and-forget (and will still always yield the thread or not? Because there's absolutely a chance in VS this might be on the UI thread and we're calling those GetChangeRanges methods and such which might (?) be expensive. Or maybe not.

This indeed might be worth documenting, since it's otherwise unclear to me why we're wrapping this in a JTF.RunAsync().

The intent is to fire-and-forget, but only yield the current thread if an async operation necessitates it.

Is GetChangeRanges expensive? It doesn't look to be to me, but maybe I'm missing an override? If it is expensive, the two conditions that use it could probably be removed, as they are very minor and specialized optimizations.

Will add some doc here

CyrusNajmabadi · 2025-01-28T01:32:51Z

src/Workspaces/Core/Portable/Workspace/Workspace_Events.cs

+            using (Logger.LogBlock(FunctionId.Workspace_EventsImmediate, (s, p, d, k) => $"{s.Id} - {p} - {d} {kind.ToString()}", newSolution, projectId, documentId, kind, CancellationToken.None))
+            {
+                args = new WorkspaceChangeEventArgs(kind, oldSolution, newSolution, projectId, documentId);
+                ev.RaiseEvent(static (handler, arg) => handler(arg.self, arg.args), (self: this, args));


aside, are these exception safe if the attached handler throws an exception? i ask because the old system might have been ok ift he Task went into a faulted state. i'm not sure about the new system.

Oops asked this same question here: #76932 (comment) I think this might cause an issue now if this were to throw so probably best to handle that better.

Any concern with me just changing RaiseEvent from using FatalError.ReportAndPropagate to instead use FatalError.ReportAndCatch (also, would it make sense to have the try/catch inside the foreach in that method?)

@jasonmalinowski -- any thoughts on change RaiseEvent to instead use FatalError.ReportAndCatch (and to move the try/catch inside the loop)?

I went ahead and just made the change to switch to ReportAndCatch

src/Workspaces/Core/Portable/Workspace/Workspace_Events.cs

jasonmalinowski · 2025-01-28T01:40:47Z

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Log/FunctionId.cs

@@ -639,4 +639,9 @@ internal enum FunctionId
    VSCode_LanguageServer_Started = 860,
    VSCode_Project_Load_Started = 861,
    VSCode_Projects_Load_Completed = 862,
+
+    // 900-999 for items that don't fit into other categories.


Suggested change

// 900-999 for items that don't fit into other categories.

// 900-999 things related to workspace and OOP solution sync

We're not restricted to three digits, so no reason to make this the final catch-all group.

I wanted a catch-all group as I was tired scanning through trying to find the next available slot with all the other groups. Is the request here to change the starting range / size of the catch all group, or just not to have it?

src/EditorFeatures/Core/Remote/SolutionChecksumUpdater.cs

jasonmalinowski · 2025-01-28T01:45:55Z

src/EditorFeatures/Core/Remote/SolutionChecksumUpdater.cs

-        using var _ = ArrayBuilder<(DocumentId id, Checksum textChecksum, ImmutableArray<TextChange> changes, Checksum newTextChecksum)>.GetInstance(out var builder);
-
-        foreach (var (oldDocument, newDocument) in values)
+        _ = _threadingContext.JoinableTaskFactory.RunAsync(async () =>


Is the intent here that this is fire-and-forget (and will still always yield the thread or not? Because there's absolutely a chance in VS this might be on the UI thread and we're calling those GetChangeRanges methods and such which might (?) be expensive. Or maybe not.

This indeed might be worth documenting, since it's otherwise unclear to me why we're wrapping this in a JTF.RunAsync().

jasonmalinowski · 2025-01-28T01:48:40Z

src/Workspaces/Core/Portable/Workspace/Workspace_Events.cs

+            using (Logger.LogBlock(FunctionId.Workspace_EventsImmediate, (s, p, d, k) => $"{s.Id} - {p} - {d} {kind.ToString()}", newSolution, projectId, documentId, kind, CancellationToken.None))
+            {
+                args = new WorkspaceChangeEventArgs(kind, oldSolution, newSolution, projectId, documentId);
+                ev.RaiseEvent(static (handler, arg) => handler(arg.self, arg.args), (self: this, args));


It appears RaiseEvent will report any exception being thrown but will still propagate it out. That might be dangerous here, because it now means any subscriber's mistake will now completely break the workspace and put the user in a really broken state. I admit I'm not sure why we aren't catching exceptions in RaiseEvent, so maybe that should be changed (or we have a variant that does catch).

use some local functions for cleanup Update some comments

CyrusNajmabadi

Signing mostly off. I do want Jason to approve as well. And there look to be a few places we can clean up. I would like exception resilience thought about. Thanks!

CyrusNajmabadi · 2025-01-28T04:05:39Z

src/EditorFeatures/Core/Remote/SolutionChecksumUpdater.cs

    public SolutionChecksumUpdater(
        Workspace workspace,
        IAsynchronousOperationListenerProvider listenerProvider,
+        IThreadingContext threadingContext,
        CancellationToken shutdownToken)


I'm curious if the shutdown token is needed. Instead of just using the disposal token on the threading context

I think they are different. VisualStudioWorkspaceServiceHubConnector has StartListening/StopListening methods, which drive the shutdownToken. However, I think the threadingContext's DisposalToken is driven by MEF and I don't think is disposed until VS shutdown.

Just saw your other comment about VisualStudioWorkspaceServiceHubConnector. Let me think about that.

It looks like right now, VisualStudioWorkspaceServiceHubConnector.StopListening is called on Workspace dispose. If I just send over the threadingContext and use it's cancellation token, are we ok with essentially just using a cancellation token from MEF dispose and not workspace dispose?

yeah. let's keep this as is. i've looked at hte code and i'm not sure how to reconcile the different Disposes/CTs. So let's not change that now.

src/EditorFeatures/Core/Remote/SolutionChecksumUpdater.cs

CyrusNajmabadi · 2025-01-28T04:08:49Z

src/VisualStudio/Core/Def/Remote/VisualStudioWorkspaceServiceHubConnector.cs

@@ -38,7 +40,7 @@ public void StartListening(Workspace workspace, object serviceOpt)
        }

        // only push solution snapshot from primary (VS) workspace:
-        _checksumUpdater = new SolutionChecksumUpdater(workspace, _listenerProvider, _disposalCancellationSource.Token);
+        _checksumUpdater = new SolutionChecksumUpdater(workspace, _listenerProvider, _threadingContext, _disposalCancellationSource.Token);


If we have the threading context, we might be able to get rid of the disposal logic here and passing that ct along

src/Workspaces/Core/Portable/Workspace/Workspace_Events.cs

CyrusNajmabadi · 2025-01-28T04:10:15Z

src/Workspaces/Core/Portable/Workspace/Workspace_Events.cs

+
+            using (Logger.LogBlock(functionId, (s, p, d, k) => $"{s.Id} - {p} - {d} {args.Kind.ToString()}", args.NewSolution, args.ProjectId, args.DocumentId, args.Kind, CancellationToken.None))
+            {
+                handlers.RaiseEvent(static (handler, args) => handler(args.NewSolution.Workspace, args), args);


I def have concerns about exceptions. Can we try/catch with a fatalerror reporter?

From the other thread:

Any concern with me just changing RaiseEvent from using FatalError.ReportAndPropagate to instead use FatalError.ReportAndCatch (also, would it make sense to have the try/catch inside the foreach in that method?)

yes. please ReportAndCatch. We really do not our eventing to ever get into a failed state (or corrupt the WS).

...kspaces/Remote/ServiceHub/Services/AssetSynchronization/RemoteAssetSynchronizationService.cs

Update comments Use string constants instead of allocating

ToddGrun · 2025-01-31T18:08:28Z

@jasonmalinowski -- would love to get your feedback on the changes and any remaining concerns you may have that haven't been addressed as desired

sharwell · 2025-01-31T21:03:25Z

For this specific performance improvement, I'm curious about the viability of an alternative approach:

Attempt to produce the solution for an incoming call
If any document is not already available, wait for queued synchronization events to be processed
Produce the solution for the incoming call under the original approach

CyrusNajmabadi · 2025-01-31T23:37:42Z

@sharwell def feel free to explore that idea. The approach here is very simple and effective and sits on a model we have good understanding of. All this is doing is allowing the current approach to operate without unnecessary delays that impact its efficacy.

CyrusNajmabadi · 2025-02-01T18:45:10Z

src/EditorFeatures/Core/Remote/SolutionChecksumUpdater.cs

+            var metricName = wasSynchronized ? "SucceededCount" : "FailedCount";
+            TelemetryLogging.LogAggregatedCounter(FunctionId.ChecksumUpdater_SynchronizeTextChangesStatus, KeyValueLogMessage.Create(m =>
+            {
+                m[TelemetryLogging.KeyName] = nameof(SolutionChecksumUpdater) + "." + metricName;


it feels like this will continually allocate. can we instead make these into constants?

Oops, I changed the other one to constants, but not this one. Will change

CyrusNajmabadi · 2025-02-01T18:45:29Z

src/EditorFeatures/Core/Remote/SolutionChecksumUpdater.cs

+            TelemetryLogging.LogAggregatedCounter(FunctionId.ChecksumUpdater_SynchronizeTextChangesStatus, KeyValueLogMessage.Create(m =>
+            {
+                m[TelemetryLogging.KeyName] = nameof(SolutionChecksumUpdater) + "." + metricName;
+                m[TelemetryLogging.KeyValue] = 1L;


is the idea that this will add 1 each time this happens?

Yes, the LogAggregatedCounter will add the specified amount to the counter

CyrusNajmabadi · 2025-02-01T18:45:51Z

src/EditorFeatures/Core/Remote/SolutionChecksumUpdater.cs

+            {
+                m[TelemetryLogging.KeyName] = nameof(SolutionChecksumUpdater) + "." + metricName;
+                m[TelemetryLogging.KeyValue] = 1L;
+                m[TelemetryLogging.KeyMetricName] = metricName;


why do we need this here, if it is encoded in KeyName above? is one of these redundant?

CyrusNajmabadi · 2025-02-01T18:46:36Z

src/EditorFeatures/Core/Remote/SolutionChecksumUpdater.cs

+            {
+                var client = await RemoteHostClient.TryGetClientAsync(_workspace, _shutdownToken).ConfigureAwait(false);
+                if (client == null)
+                    return false;


slightly odd that this would show up as 'failure'. this means a customer who has OOP off will show has massively failing this op. i think this should maybe count as 'success' or perhaps NA so we don't get the wrong idea about being in a bad statee.

CyrusNajmabadi · 2025-02-01T18:52:44Z

...kspaces/Remote/ServiceHub/Services/AssetSynchronization/RemoteAssetSynchronizationService.cs

+            {
+                m[TelemetryLogging.KeyName] = keyName;
+                m[TelemetryLogging.KeyValue] = 1L;
+                m[TelemetryLogging.KeyMetricName] = metricName;


same comment if we need the keyname and metric name esp if hte keyname contains the metric name.

CyrusNajmabadi · 2025-02-01T18:53:56Z

with your latest changes, i feel fine with this going in (even without Jason signing off). he will eventuallyget around to looking at this.

dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels Jan 26, 2025

dotnet-policy-service bot added the VSCode label Jan 26, 2025

don't use nullability annotations

0394a81

CyrusNajmabadi reviewed Jan 26, 2025

View reviewed changes

src/Workspaces/SharedUtilitiesAndExtensions/Compiler/Core/Log/FunctionId.cs Outdated Show resolved Hide resolved

Merge branch 'dotnet:main' into WorkspaceChangedImmediate

4ca3e4a

ToddGrun marked this pull request as ready for review January 27, 2025 03:36

ToddGrun requested a review from a team as a code owner January 27, 2025 03:36

ToddGrun changed the title ~~WIP: Add new workspace event that gives handlers the opportunity to be processed immediately~~ Add new workspace event that gives handlers the opportunity to be processed immediately Jan 27, 2025

Add a big old range for funciton ids that don't fit into other catego…

445cc6c

…ries

CyrusNajmabadi approved these changes Jan 27, 2025

View reviewed changes

Remove the _textChange ABWQ point of asynchronicity to try and furthe…

b1797c2

…r reduce the amount of serialization and deserialization

Switch a couple tests from FeaturesTestCompositions to EditorTestComp…

ce1b2b7

…ositions to allow composition to include an IThreadingContext

build-analysis bot mentioned this pull request Jan 27, 2025

The active test run was aborted. Reason: Test host process crashed dotnet/dnceng#451

Open

3 tasks

push out aggregated telemetry with success/fail counts for SolutionCh…

8a841d6

…ecksumUpdater.DispatchSynchronizeTextChanges and RemoteAssetSynchronizationService.SynchronizeTextChangesAsync

CyrusNajmabadi reviewed Jan 28, 2025

View reviewed changes

src/EditorFeatures/Core/Remote/SolutionChecksumUpdater.cs Outdated Show resolved Hide resolved

CyrusNajmabadi reviewed Jan 28, 2025

View reviewed changes

...kspaces/Remote/ServiceHub/Services/AssetSynchronization/RemoteAssetSynchronizationService.cs Outdated Show resolved Hide resolved

CyrusNajmabadi reviewed Jan 28, 2025

View reviewed changes

jasonmalinowski reviewed Jan 28, 2025

View reviewed changes

ToddGrun added 2 commits January 27, 2025 18:28

Comment the JTF.RunAsync call

64a596b

use some local functions for cleanup Update some comments

move duplicated code to helper method

d715ab9

CyrusNajmabadi approved these changes Jan 28, 2025

View reviewed changes

ToddGrun added 2 commits January 28, 2025 05:46

Add assert

c643d4b

Update comments Use string constants instead of allocating

Go ahead and catch exceptions during EventMap eventing

595fb9f

CyrusNajmabadi reviewed Feb 1, 2025

View reviewed changes

	// 900-999 for items that don't fit into other categories.
	// 900-999 things related to workspace and OOP solution sync

Add new workspace event that gives handlers the opportunity to be processed immediately #76932

Are you sure you want to change the base?

Add new workspace event that gives handlers the opportunity to be processed immediately #76932

Conversation

ToddGrun commented Jan 26, 2025 • edited Loading

ToddGrun commented Jan 26, 2025 • edited Loading

ToddGrun commented Jan 27, 2025

ToddGrun commented Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CyrusNajmabadi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ToddGrun commented Jan 31, 2025

sharwell commented Jan 31, 2025

CyrusNajmabadi commented Jan 31, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CyrusNajmabadi commented Feb 1, 2025

ToddGrun commented Jan 26, 2025 •

edited

Loading

ToddGrun commented Jan 26, 2025 •

edited

Loading

ToddGrun commented Jan 27, 2025 •

edited

Loading