Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new workspace event that gives handlers the opportunity to be processed immediately #76932

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

ToddGrun
Copy link
Contributor

@ToddGrun ToddGrun commented Jan 26, 2025

This is in response to allocations seen when the CA process attempts to operate on a sourcetext version that has not been synchronized over. In those cases, the OOP process requests a full deserialization of the source text causing full buffer allocations in both VS and the CA process.

There are several points of asynchronicity in the current system around sending over buffer changes, reducing any of which would reduce the likelihood of needing this full serialization.

This PR removes one of those points of asynchronicity, specifically in the workspace eventing layer. Previously, all notiications were done on a delayed basis, by wrapping each notification in the WorkspaceSpace.ScheduleTask call from the RaiseWorkspaceChangedEventAsync method. This method allows callers to indicate they need to be called immediately upon the workspace change. As noted by a doc comment in the PR, these handlers should be very fast.

The numbers from speedometer came back looking pretty good (below). Another point of asynchronicity could be investigated around the usage of the _textChangeQueue ABWQ, but it might not be worth it as it looks like this change removes the majority of the serialization during typing. Other alternatives within the text syncing system such as allowing OOP to request a text change instead of the full buffer contents are also possible, but previous investigations into that ended up complicated and incomplete.

*** before allocations during typing in VS process ***
image

*** commit 3 allocations during typing in VS process ***
image

*** commit 5 allocations during typing in VS process ***
No allocations under SerializableSourceText.Serialize during typing period in profile

*** before allocations during typing in CA process ***
image

*** commit 3 allocations during typing in CA process ***
image

*** commit 5 allocations during typing in CA process ***
No allocations under SerializableSourceText.Deserialize during typing period in profile

…e processed immediately

This is in response to allocations seen when the OOP process attempts to operate on a sourcetext version that has not been synchronized over. In those cases, the OOP process requests a full deserialization of the source text causing full buffer allocations in both VS and the CA process.

There are several points of asynchronicity in the current system around sending over buffer changes, reducing any of which would reduce the likelihood of needing this full serialization.

This PR removes one of those points of asynchronicity, specifically in the workspace eventing layer. Previously, all notiications were done on a delayed basis, by wrapping each notification in the WorkspaceSpace.ScheduleTask call from the RaiseWorkspaceChangedEventAsync method. This method allows callers to indicate they need to be called immediately upon the workspace change. As noted by a doc comment in the PR, these handlers should be very fast.

Going to mark this as draft and get speedometer numbers off this to see if this helps enough, or if instead the other points of asynchronicity should be investigated (specifically, the usage of the _textChangeQueue ABWQ). There are other alternatives within the text syncing system such as allowing OOP to request a text change instead of the full buffer contents, but previous investigations into that ended up complicated and incomplete.
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels Jan 26, 2025
@ToddGrun
Copy link
Contributor Author

ToddGrun commented Jan 26, 2025

@ToddGrun
Copy link
Contributor Author

Moving out of draft mode as the numbers look pretty good. Still some serialization happening during typing, but it's a lot less than before and this change was pretty trivial. If code reviewers think adding this to the workspace level is not desirable, can look into other approaches.

@ToddGrun ToddGrun marked this pull request as ready for review January 27, 2025 03:36
@ToddGrun ToddGrun requested a review from a team as a code owner January 27, 2025 03:36
@ToddGrun ToddGrun changed the title WIP: Add new workspace event that gives handlers the opportunity to be processed immediately Add new workspace event that gives handlers the opportunity to be processed immediately Jan 27, 2025
…r reduce the amount of serialization and deserialization
@ToddGrun
Copy link
Contributor Author

ToddGrun commented Jan 27, 2025

commit 5 PR validation insertion: https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/606025

…ositions to allow composition to include an IThreadingContext
…ecksumUpdater.DispatchSynchronizeTextChanges and RemoteAssetSynchronizationService.SynchronizeTextChangesAsync
using var _ = ArrayBuilder<(DocumentId id, Checksum textChecksum, ImmutableArray<TextChange> changes, Checksum newTextChecksum)>.GetInstance(out var builder);

foreach (var (oldDocument, newDocument) in values)
_ = _threadingContext.JoinableTaskFactory.RunAsync(async () =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should def doc the design here. make it clear why we've written it this way.

note: you can also use IAsyncToken here to track the work, allowing you to properly be able to unit test as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent here that this is fire-and-forget (and will still always yield the thread or not? Because there's absolutely a chance in VS this might be on the UI thread and we're calling those GetChangeRanges methods and such which might (?) be expensive. Or maybe not.

This indeed might be worth documenting, since it's otherwise unclear to me why we're wrapping this in a JTF.RunAsync().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to fire-and-forget, but only yield the current thread if an async operation necessitates it.

Is GetChangeRanges expensive? It doesn't look to be to me, but maybe I'm missing an override? If it is expensive, the two conditions that use it could probably be removed, as they are very minor and specialized optimizations.

Will add some doc here

using (Logger.LogBlock(FunctionId.Workspace_EventsImmediate, (s, p, d, k) => $"{s.Id} - {p} - {d} {kind.ToString()}", newSolution, projectId, documentId, kind, CancellationToken.None))
{
args = new WorkspaceChangeEventArgs(kind, oldSolution, newSolution, projectId, documentId);
ev.RaiseEvent(static (handler, arg) => handler(arg.self, arg.args), (self: this, args));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside, are these exception safe if the attached handler throws an exception? i ask because the old system might have been ok ift he Task went into a faulted state. i'm not sure about the new system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops asked this same question here: #76932 (comment) I think this might cause an issue now if this were to throw so probably best to handle that better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any concern with me just changing RaiseEvent from using FatalError.ReportAndPropagate to instead use FatalError.ReportAndCatch (also, would it make sense to have the try/catch inside the foreach in that method?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasonmalinowski -- any thoughts on change RaiseEvent to instead use FatalError.ReportAndCatch (and to move the try/catch inside the loop)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and just made the change to switch to ReportAndCatch

src/Workspaces/Core/Portable/Workspace/Workspace_Events.cs Outdated Show resolved Hide resolved
src/Workspaces/Core/Portable/Workspace/Workspace_Events.cs Outdated Show resolved Hide resolved
@@ -639,4 +639,9 @@ internal enum FunctionId
VSCode_LanguageServer_Started = 860,
VSCode_Project_Load_Started = 861,
VSCode_Projects_Load_Completed = 862,

// 900-999 for items that don't fit into other categories.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// 900-999 for items that don't fit into other categories.
// 900-999 things related to workspace and OOP solution sync

We're not restricted to three digits, so no reason to make this the final catch-all group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted a catch-all group as I was tired scanning through trying to find the next available slot with all the other groups. Is the request here to change the starting range / size of the catch all group, or just not to have it?

using var _ = ArrayBuilder<(DocumentId id, Checksum textChecksum, ImmutableArray<TextChange> changes, Checksum newTextChecksum)>.GetInstance(out var builder);

foreach (var (oldDocument, newDocument) in values)
_ = _threadingContext.JoinableTaskFactory.RunAsync(async () =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent here that this is fire-and-forget (and will still always yield the thread or not? Because there's absolutely a chance in VS this might be on the UI thread and we're calling those GetChangeRanges methods and such which might (?) be expensive. Or maybe not.

This indeed might be worth documenting, since it's otherwise unclear to me why we're wrapping this in a JTF.RunAsync().

using (Logger.LogBlock(FunctionId.Workspace_EventsImmediate, (s, p, d, k) => $"{s.Id} - {p} - {d} {kind.ToString()}", newSolution, projectId, documentId, kind, CancellationToken.None))
{
args = new WorkspaceChangeEventArgs(kind, oldSolution, newSolution, projectId, documentId);
ev.RaiseEvent(static (handler, arg) => handler(arg.self, arg.args), (self: this, args));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears RaiseEvent will report any exception being thrown but will still propagate it out. That might be dangerous here, because it now means any subscriber's mistake will now completely break the workspace and put the user in a really broken state. I admit I'm not sure why we aren't catching exceptions in RaiseEvent, so maybe that should be changed (or we have a variant that does catch).

use some local functions for cleanup
Update some comments
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signing mostly off. I do want Jason to approve as well. And there look to be a few places we can clean up. I would like exception resilience thought about. Thanks!

public SolutionChecksumUpdater(
Workspace workspace,
IAsynchronousOperationListenerProvider listenerProvider,
IThreadingContext threadingContext,
CancellationToken shutdownToken)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if the shutdown token is needed. Instead of just using the disposal token on the threading context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are different. VisualStudioWorkspaceServiceHubConnector has StartListening/StopListening methods, which drive the shutdownToken. However, I think the threadingContext's DisposalToken is driven by MEF and I don't think is disposed until VS shutdown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw your other comment about VisualStudioWorkspaceServiceHubConnector. Let me think about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like right now, VisualStudioWorkspaceServiceHubConnector.StopListening is called on Workspace dispose. If I just send over the threadingContext and use it's cancellation token, are we ok with essentially just using a cancellation token from MEF dispose and not workspace dispose?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. let's keep this as is. i've looked at hte code and i'm not sure how to reconcile the different Disposes/CTs. So let's not change that now.

src/EditorFeatures/Core/Remote/SolutionChecksumUpdater.cs Outdated Show resolved Hide resolved
@@ -38,7 +40,7 @@ public void StartListening(Workspace workspace, object serviceOpt)
}

// only push solution snapshot from primary (VS) workspace:
_checksumUpdater = new SolutionChecksumUpdater(workspace, _listenerProvider, _disposalCancellationSource.Token);
_checksumUpdater = new SolutionChecksumUpdater(workspace, _listenerProvider, _threadingContext, _disposalCancellationSource.Token);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the threading context, we might be able to get rid of the disposal logic here and passing that ct along

src/Workspaces/Core/Portable/Workspace/Workspace_Events.cs Outdated Show resolved Hide resolved

using (Logger.LogBlock(functionId, (s, p, d, k) => $"{s.Id} - {p} - {d} {args.Kind.ToString()}", args.NewSolution, args.ProjectId, args.DocumentId, args.Kind, CancellationToken.None))
{
handlers.RaiseEvent(static (handler, args) => handler(args.NewSolution.Workspace, args), args);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I def have concerns about exceptions. Can we try/catch with a fatalerror reporter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the other thread:

Any concern with me just changing RaiseEvent from using FatalError.ReportAndPropagate to instead use FatalError.ReportAndCatch (also, would it make sense to have the try/catch inside the foreach in that method?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. please ReportAndCatch. We really do not our eventing to ever get into a failed state (or corrupt the WS).

Update comments
Use string constants instead of allocating
@ToddGrun
Copy link
Contributor Author

@jasonmalinowski -- would love to get your feedback on the changes and any remaining concerns you may have that haven't been addressed as desired

@sharwell
Copy link
Member

For this specific performance improvement, I'm curious about the viability of an alternative approach:

  1. Attempt to produce the solution for an incoming call
  2. If any document is not already available, wait for queued synchronization events to be processed
  3. Produce the solution for the incoming call under the original approach

@CyrusNajmabadi
Copy link
Member

@sharwell def feel free to explore that idea. The approach here is very simple and effective and sits on a model we have good understanding of. All this is doing is allowing the current approach to operate without unnecessary delays that impact its efficacy.

var metricName = wasSynchronized ? "SucceededCount" : "FailedCount";
TelemetryLogging.LogAggregatedCounter(FunctionId.ChecksumUpdater_SynchronizeTextChangesStatus, KeyValueLogMessage.Create(m =>
{
m[TelemetryLogging.KeyName] = nameof(SolutionChecksumUpdater) + "." + metricName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels like this will continually allocate. can we instead make these into constants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I changed the other one to constants, but not this one. Will change

TelemetryLogging.LogAggregatedCounter(FunctionId.ChecksumUpdater_SynchronizeTextChangesStatus, KeyValueLogMessage.Create(m =>
{
m[TelemetryLogging.KeyName] = nameof(SolutionChecksumUpdater) + "." + metricName;
m[TelemetryLogging.KeyValue] = 1L;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the idea that this will add 1 each time this happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the LogAggregatedCounter will add the specified amount to the counter

{
m[TelemetryLogging.KeyName] = nameof(SolutionChecksumUpdater) + "." + metricName;
m[TelemetryLogging.KeyValue] = 1L;
m[TelemetryLogging.KeyMetricName] = metricName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this here, if it is encoded in KeyName above? is one of these redundant?

{
var client = await RemoteHostClient.TryGetClientAsync(_workspace, _shutdownToken).ConfigureAwait(false);
if (client == null)
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly odd that this would show up as 'failure'. this means a customer who has OOP off will show has massively failing this op. i think this should maybe count as 'success' or perhaps NA so we don't get the wrong idea about being in a bad statee.

{
m[TelemetryLogging.KeyName] = keyName;
m[TelemetryLogging.KeyValue] = 1L;
m[TelemetryLogging.KeyMetricName] = metricName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment if we need the keyname and metric name esp if hte keyname contains the metric name.

@CyrusNajmabadi
Copy link
Member

with your latest changes, i feel fine with this going in (even without Jason signing off). he will eventuallyget around to looking at this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead VSCode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants