Cache semantic classifications and display from the cache while a solution is loading. #46955

CyrusNajmabadi · 2020-08-19T05:30:01Z

Draft idea around making VS feel less encumbered during initial solution load. Part of hte caching working group experiments.

Note, this has a substantial impact on perceived classification perf on load. With Roslyn.sln itself, i go from anywhere from 30s to +1minute, down to just a couple of seconds before you have semantic classifications.

Tagging @mikadumont. This also ties into our discussions with platform about having some sort of good affordance in VS to let users know that this is initial solution-load and that values may be cached/stale, but will be available soon.

src/EditorFeatures/Core/Tagging/AbstractAsynchronousTaggerProvider.TagSource.cs

CyrusNajmabadi · 2020-08-19T05:53:36Z

src/VisualStudio/Core/Def/Implementation/Remote/VisualStudioRemoteHostClientProvider.cs

@@ -51,23 +51,8 @@ private VisualStudioRemoteHostClientProvider(HostWorkspaceServices services)
            _lazyClient = new AsyncLazy<RemoteHostClient>(CreateHostClientAsync, cacheResult: true);
        }

-        private async Task<RemoteHostClient> CreateHostClientAsync(CancellationToken cancellationToken)


not part of this PR. i'm branched off of Tomas's change so i can use VS without OOP crashing. Once #46929 goes in, this will be gone.

CyrusNajmabadi · 2020-08-19T05:54:42Z

src/Workspaces/Core/Portable/Shared/Utilities/AsyncBatchingWorkQueue.cs

-                if (_uniqueItems.Add(item))
-                    _nextBatch.Add(item);
+                _nextBatchMap.Remove(item);
+                _nextBatchMap.Add(item, _nextIndex++);


safe to do ++ here as we're under the lock.

CyrusNajmabadi · 2020-08-19T05:57:39Z

...ures/Core/Implementation/Classification/SemanticClassificationBufferTaggerProvider.Tagger.cs

@@ -29,6 +29,7 @@ private class Tagger : ForegroundThreadAffinitizedObject, IAccurateTagger<IClass
            private readonly SemanticClassificationBufferTaggerProvider _owner;
            private readonly ITextBuffer _subjectBuffer;
            private readonly ITaggerEventSource _eventSource;
+            private readonly SemanticClassifier _classifier;


changed from static helpers to an instance type now that it contains more interesting logic around caching/shutdown.

CyrusNajmabadi · 2020-08-19T05:58:15Z

src/EditorFeatures/Core/Implementation/Classification/SemanticClassificationUtilities.cs

@@ -20,9 +28,54 @@

 namespace Microsoft.CodeAnalysis.Editor.Implementation.Classification
 {
-    internal static class SemanticClassificationUtilities
+    internal partial class SemanticClassifier


i purposefully didn't rename the file as it causes github to lose the diff. this makes it easier to understand hte change. i will rename in followup pr.

At this point there's more code that's new than existed... 😄

CyrusNajmabadi · 2020-08-19T06:03:18Z

src/EditorFeatures/Core/Implementation/Classification/SemanticClassificationUtilities.cs

+            {
+                // We're reading and interpreting arbitrary data from disk.  This may be invalid for any reason.
+                Logger.Log(FunctionId.SemanticClassifier_ExceptionInCacheRead);
+                return false;


general pattern we use elsewhere when reading from cache.

CyrusNajmabadi · 2020-08-19T06:10:17Z

src/EditorFeatures/Core/Implementation/Classification/SemanticClassificationUtilities.cs

+            var seenIds = new HashSet<DocumentId>();
+            foreach (var document in documentsToClassify)
+                Contract.ThrowIfFalse(seenIds.Add(document.Id));
+#endif


debug check to ensure that we are only processing a document once per batch.

src/Workspaces/Core/Portable/Storage/AbstractPersistentStorageService.cs

src/Workspaces/Core/Portable/Storage/SQLite/v1/SQLitePersistentStorageService.cs

src/Workspaces/Core/Portable/Storage/SQLite/v1/SQLitePersistentStorage_BulkPopulateIds.cs

src/Workspaces/Core/Portable/Storage/SQLite/v1/SQLitePersistentStorage_DocumentIds.cs

jasonmalinowski

There seems to be a bug where the cache that's maintained in the OOP process only is using the DocumentKey as the cache key, but the contents being cached is implicitly depending on the textSpan being queried. Blocking only for that bug; the rest of this was both solid and also well-structured so it was nice and easy to read.

...s/Remote/ServiceHub/Services/CodeAnalysis/CodeAnalysisService_SemanticClassificationCache.cs

jasonmalinowski · 2020-08-27T18:53:20Z

...s/Remote/ServiceHub/Services/CodeAnalysis/CodeAnalysisService_SemanticClassificationCache.cs

+                // Then place the cached information for this doc at the end.
+                _cachedData.AddLast((documentKey.Id, checksum, classifiedSpans));
+
+                // And ensure we don't cache too many docs.


So once we do get everything loaded, is the intent that this cache then falls away, or is continued to be used for opening new documents even once we have full semantics?

So once we do get everything loaded, is the intent that this cache then falls away,

The caches falls away. This happens in two ways:

teh calls to CacheSemanticClassifications will clear it explicitly (since we are fully loaded at that point)

the classifier doesn't call into the cache once we are fully loaded.

jasonmalinowski · 2020-08-27T18:55:35Z

...s/Remote/ServiceHub/Services/CodeAnalysis/CodeAnalysisService_SemanticClassificationCache.cs

+                    if (textSpan.IntersectsWith(classifiedSpan))
+                        tempResult.Add(new ClassifiedSpan(classification, classifiedSpan));


Were the spans originally written in order? Can we abandon the loop once we're past the end of textSpan?

good point!

...s/Remote/ServiceHub/Services/CodeAnalysis/CodeAnalysisService_SemanticClassificationCache.cs

src/EditorFeatures/Core/Implementation/Classification/SemanticClassificationUtilities.cs

jasonmalinowski

Looks good other than the JTF.Run() that I think you need to put back.

src/EditorFeatures/Core/Tagging/AbstractAsynchronousTaggerProvider.cs

...s/Remote/ServiceHub/Services/CodeAnalysis/CodeAnalysisService_SemanticClassificationCache.cs

…lysisService_SemanticClassificationCache.cs Co-authored-by: Jason Malinowski <jason@jason-m.com>

ghost · 2020-08-27T22:31:38Z

Hello @CyrusNajmabadi!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.

ghost

Auto-approval

jasonmalinowski · 2020-08-27T23:07:45Z

src/EditorFeatures/Core/Implementation/Classification/SemanticClassificationUtilities.cs

-                    return workspaceLoadedService.WaitUntilFullyLoadedAsync(CancellationToken.None);
-                });
+                    var workspaceLoadedService = w.Services.GetRequiredService<IWorkspaceStatusService>();
+                    await workspaceLoadedService.WaitUntilFullyLoadedAsync(CancellationToken.None).ConfigureAwait(false);


Did this need to be async? Task.Run() should have worked fine if the lambda is just directly returning the Task.

ghost · 2020-08-27T23:09:24Z

Apologies, while this PR appears ready to be merged, I've been configured to only merge when all checks have explicitly passed. The following integrations have not reported any progress on their checks and are blocking auto-merge:

Azure Pipelines

These integrations are possibly never going to report a check, and unblocking auto-merge likely requires a human being to update my configuration to exempt these integrations from requiring a passing check.

Give feedback on this

From the bot dev team

We've tried to tune the bot such that it posts a comment like this only when auto-merge is blocked for exceptional, non-intuitive reasons. When the bot's auto-merge capability is properly configured, auto-merge should operate as you would intuitively expect and you should not see any spurious comments.

Please reach out to us at fabricbotservices@microsoft.com to provide feedback if you believe you're seeing this comment appear spuriously. Please note that we usually are unable to update your bot configuration on your team's behalf, but we're happy to help you identify your bot admin.

addressed perf concerns

tmat and others added 10 commits August 18, 2020 12:35

Initialize OOP telemetry session lazily after in-proc one is initialized

cc95ab4

Cache semantic classifications across VS sessions

a77649f

Merge branch 'cachedSemanticClassification' into merged

1168cb8

move code

fc91948

Merge branch 'cachedSemanticClassification' into merged

794d5df

free after

d3b1a6c

Immediately display initial tags.

59561fd

Add docs

80861ef

Remove unnecessary code.

ea9027c

Cleanup

fb6b1a5