-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further checksum syncing cleanup work. #70341
Conversation
@@ -136,7 +136,7 @@ public async Task TestProjectSynchronization() | |||
var assetSource = new SimpleAssetSource(workspace.Services.GetService<ISerializerService>(), map); | |||
|
|||
var service = new AssetProvider(sessionId, storage, assetSource, remoteWorkspace.Services.GetService<ISerializerService>()); | |||
await service.SynchronizeProjectAssetsAsync(new HashSet<Checksum> { await project.State.GetChecksumAsync(CancellationToken.None) }, CancellationToken.None); | |||
await service.SynchronizeProjectAssetsAsync(await project.State.GetStateChecksumsAsync(CancellationToken.None), CancellationToken.None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made this a more strongly tyepd overload. it now takes ProjectStateChecksums instead of a random checksum instance.
@@ -69,6 +69,7 @@ private async Task<ProjectStateChecksums> ComputeChecksumsAsync(CancellationToke | |||
var analyzerConfigDocumentChecksums = new ChecksumCollection(await analyzerConfigDocumentChecksumTasks.WhenAll().ConfigureAwait(false)); | |||
|
|||
return new ProjectStateChecksums( | |||
this.Id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added projectid to the projectstatechecksums. this avoids a lot of roundtrips to jsut go from the checksum object to get the info checksum to get the info to get the id.
using var _ = ArrayBuilder<ValueTuple<Checksum, T>>.GetInstance(checksums.Count, out var list); | ||
foreach (var checksum in checksums) | ||
list.Add(ValueTuple.Create(checksum, await GetAssetAsync<T>(checksum, cancellationToken).ConfigureAwait(false))); | ||
list.Add(ValueTuple.Create(checksum, GetRequiredAsset<T>(checksum))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of these calls to "get the asset asycnhronously" became "get teh required asset" since they are called directly after the call to actually synchronize the asset. This allows us to make stronger assertions that the data must be present, and that it's the prior syncronization call taht is fetching the items, and not this potential GetASsetAsync call that is doing it.
@@ -100,7 +91,7 @@ public async ValueTask SynchronizeSolutionAssetsAsync(Checksum solutionChecksum, | |||
} | |||
} | |||
|
|||
public async ValueTask SynchronizeProjectAssetsAsync(HashSet<Checksum> projectChecksums, CancellationToken cancellationToken) | |||
public async ValueTask SynchronizeProjectAssetsAsync(ProjectStateChecksums projectChecksums, CancellationToken cancellationToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made strongly typed.
|
||
public async ValueTask SynchronizeAssetsAsync(HashSet<Checksum> checksums, CancellationToken cancellationToken) | ||
{ | ||
using (await s_gate.DisposableWaitAsync(cancellationToken).ConfigureAwait(false)) | ||
{ | ||
await SynchronizeAssets_NoLockAsync(checksums, cancellationToken).ConfigureAwait(false); | ||
// get children of solution checksum object at once | ||
await _assetProvider.SynchronizeAssetsAsync(checksums, cancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inlined method.
foreach (var project in solutionChecksumObject.Projects) | ||
{ | ||
var projectStateChecksums = _assetProvider.GetRequiredAsset<ProjectStateChecksums>(project); | ||
await SynchronizeProjectAssetsAsync(projectStateChecksums, cancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all this became much more strongly typed. the solution checksum points at the proejct checksums. from the project checksums we retrieve the project-state-checkum objects. which we can then use to synchronize that project.
i'm trying to make things not just take generic checksums, making it clearer what data needs to be passed to what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, instead of blocking the entire operation while doing a sync. We can now interleave syncing a project's worth of info at a time.
var oldMap = await GetProjectMapAsync(solution, oldChecksums, cancellationToken).ConfigureAwait(false); | ||
var newMap = await GetProjectMapAsync(_assetProvider, newChecksums, cancellationToken).ConfigureAwait(false); | ||
await PopulateOldProjectMapAsync().ConfigureAwait(false); | ||
await PopulateNewProjectMapAsync(this).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inlined these helper methods to now be local functions in this main function. moved to pooled dictionaries as well to help with allocation overhead.
var projectStateChecksums = await @this._assetProvider.GetAssetsAsync<ProjectStateChecksums>(news.Object, cancellationToken).ConfigureAwait(false); | ||
|
||
foreach (var (_, projectStateChecksum) in projectStateChecksums) | ||
newMap.Add(projectStateChecksum.ProjectId, projectStateChecksum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code is much more efficient than what is is replacing. because the projectStateChecksums exposes a .ProjectId now, we avoid a pointless roundtrip between the server and the host.
} | ||
} | ||
|
||
public async ValueTask SynchronizeProjectAssetsAsync(HashSet<Checksum> projectChecksums, CancellationToken cancellationToken) | ||
public async ValueTask SynchronizeProjectAssetsAsync(ProjectStateChecksums projectChecksum, CancellationToken cancellationToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more strongly typed. much clearer about what to pass.
} | ||
} | ||
|
||
private async ValueTask SynchronizeProjectAssets_NoLockAsync(IReadOnlyCollection<Checksum> projectChecksums, CancellationToken cancellationToken) | ||
private async ValueTask SynchronizeProjectAssets_NoLockAsync(ProjectStateChecksums projectChecksum, CancellationToken cancellationToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unlike before, this syncs just one project. the caller of this is reponsible for handling multiple projects if that is relevant.
{ | ||
var projectChecksums = await projectState.GetStateChecksumsAsync(cancellationToken).ConfigureAwait(false); | ||
if (olds.Object.Contains(projectChecksums.Checksum)) | ||
oldMap.Add(projectId, projectChecksums); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same logic as before, just inlined.
@@ -252,11 +256,11 @@ private async Task<Solution> UpdateProjectAsync(Project project, ProjectStateChe | |||
project = await UpdateDocumentsAsync( | |||
project, | |||
newProjectChecksums, | |||
project.State.DocumentStates.States.Values, | |||
project.State.DocumentStates, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made this method strongly typed, which avoids unnecessary IEnumerable allocs.
(solution, documents) => solution.AddDocuments(documents), | ||
(solution, documentIds) => solution.RemoveDocuments(documentIds), | ||
static (solution, documents) => solution.AddDocuments(documents), | ||
static (solution, documentIds) => solution.RemoveDocuments(documentIds), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made explicit that these do not alloc lambdas.
var documentChecksums = await state.GetStateChecksumsAsync(cancellationToken).ConfigureAwait(false); | ||
if (olds.Object.Contains(documentChecksums.Checksum)) | ||
oldMap.Add(state.Id, documentChecksums); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the same logic as before. just inliend.
var documentStateChecksums = await @this._assetProvider.GetAssetsAsync<DocumentStateChecksums>(news.Object, cancellationToken).ConfigureAwait(false); | ||
|
||
foreach (var (_, documentStateChecksum) in documentStateChecksums) | ||
newMap.Add(documentStateChecksum.DocumentId, documentStateChecksum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is much more efficient than before. similar to projects, this avoids an unnecessary round trip to determine document ids.
foreach (var documentChecksum in documentChecksums) | ||
infoChecksums.Add(documentChecksum.Item2.Info); | ||
|
||
var infos = await assetProvider.GetAssetsAsync<DocumentInfo.DocumentAttributes>(infoChecksums, cancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was the round-trip just for the purpose of getting a doc id.
foreach (var projectChecksum in projectChecksums) | ||
infoChecksums.Add(projectChecksum.Item2.Info); | ||
|
||
var infos = await assetProvider.GetAssetsAsync<ProjectInfo.ProjectAttributes>(infoChecksums, cancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a round trip just to get project ids.
checksums.AddIfNotNullChecksum(this.Checksum); | ||
checksums.AddIfNotNullChecksum(this.Info); | ||
checksums.AddIfNotNullChecksum(this.Text); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only used in tests. moved there. will do the same with ProjectStateChecksums in a later PR.
// third and last get direct children for all projects and documents in the solution | ||
await SynchronizeProjectAssets_NoLockAsync(solutionChecksumObject.Projects, cancellationToken).ConfigureAwait(false); | ||
// third and last get direct children for all projects and documents in the solution | ||
foreach (var project in solutionChecksumObject.Projects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lifted the loop out of the SynchronizeProjectAssets_NoLockAsync helper. this makes it easy to sync just a project at a time. note: the next PR ensures that when we go to sync a particular project that the other side can do that work efficiently (without having to scan the entire solution for the data it needs).
|
||
async Task PopulateOldDocumentMapAsync() | ||
{ | ||
foreach (var (_, state) in existingTextDocumentStates.States) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by passing in TextDocumentStates<T>
this becomes a no-alloc enumeration.
@@ -497,84 +520,6 @@ private async Task<TextDocument> UpdateDocumentInfoAsync(TextDocument document, | |||
return document; | |||
} | |||
|
|||
private static async ValueTask<Dictionary<DocumentId, DocumentStateChecksums>> GetDocumentMapAsync(AssetProvider assetProvider, HashSet<Checksum> documents, CancellationToken cancellationToken) | |||
{ | |||
var map = new Dictionary<DocumentId, DocumentStateChecksums>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasteful non-pooled dicitonary.
Made a first pass through (pre-coffee) and it seems to make sense. Will make another pass through post-coffee and see if it still makes sense. |
@@ -69,6 +69,7 @@ private async Task<ProjectStateChecksums> ComputeChecksumsAsync(CancellationToke | |||
var analyzerConfigDocumentChecksums = new ChecksumCollection(await analyzerConfigDocumentChecksumTasks.WhenAll().ConfigureAwait(false)); | |||
|
|||
return new ProjectStateChecksums( | |||
this.Id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question about something not changed in this PR:
I see SelectAsArray done above, which I understand gives back an ImmutableArray that allows struct based enumeration. However, that array did require a heap allocation. It seems to me that if you know the collection will only be enumerated once locally (as in this case), it's better to have a heap based enumerator allocated than an ImmutableArray of a potentially large size.
Of course, that assumption could be broken and then you end up doing multiple enumerator allocations, but that doesn't seem too likely and too costly if it does happen. Another option if repeated enumerations is a concern would be an ArrayBuilder based SelectAs method, as that could have the benefit of no-allocation and struct based enumeration.
Or maybe I'm just overthinking this.
private async ValueTask CollectProjectStateChecksumsAsync(HashSet<Checksum> set, IReadOnlyCollection<Checksum> checksums, CancellationToken cancellationToken) | ||
{ | ||
foreach (var checksum in checksums) | ||
void CollectChecksumChildren(ChecksumSynchronizer @this, ChecksumCollection collection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChecksumSynchronizer @this,
Curious why you passed this in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primary changes here are about including more information in our rich state-checksum objects (whcih are not checksums themselves) to avoid unnecessary round-trips from server to host, and also to make certain syncing methods more strongly typed to prevent passing in random innapropriate checksums into them.
More changes to come. Just trying to keep things bite-sized.