Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further checksum syncing cleanup work. #70341

Merged
merged 23 commits into from
Oct 12, 2023

Conversation

CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi commented Oct 11, 2023

Primary changes here are about including more information in our rich state-checksum objects (whcih are not checksums themselves) to avoid unnecessary round-trips from server to host, and also to make certain syncing methods more strongly typed to prevent passing in random innapropriate checksums into them.

More changes to come. Just trying to keep things bite-sized.

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels Oct 11, 2023
@@ -136,7 +136,7 @@ public async Task TestProjectSynchronization()
var assetSource = new SimpleAssetSource(workspace.Services.GetService<ISerializerService>(), map);

var service = new AssetProvider(sessionId, storage, assetSource, remoteWorkspace.Services.GetService<ISerializerService>());
await service.SynchronizeProjectAssetsAsync(new HashSet<Checksum> { await project.State.GetChecksumAsync(CancellationToken.None) }, CancellationToken.None);
await service.SynchronizeProjectAssetsAsync(await project.State.GetStateChecksumsAsync(CancellationToken.None), CancellationToken.None);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made this a more strongly tyepd overload. it now takes ProjectStateChecksums instead of a random checksum instance.

@@ -69,6 +69,7 @@ private async Task<ProjectStateChecksums> ComputeChecksumsAsync(CancellationToke
var analyzerConfigDocumentChecksums = new ChecksumCollection(await analyzerConfigDocumentChecksumTasks.WhenAll().ConfigureAwait(false));

return new ProjectStateChecksums(
this.Id,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added projectid to the projectstatechecksums. this avoids a lot of roundtrips to jsut go from the checksum object to get the info checksum to get the info to get the id.

using var _ = ArrayBuilder<ValueTuple<Checksum, T>>.GetInstance(checksums.Count, out var list);
foreach (var checksum in checksums)
list.Add(ValueTuple.Create(checksum, await GetAssetAsync<T>(checksum, cancellationToken).ConfigureAwait(false)));
list.Add(ValueTuple.Create(checksum, GetRequiredAsset<T>(checksum)));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of these calls to "get the asset asycnhronously" became "get teh required asset" since they are called directly after the call to actually synchronize the asset. This allows us to make stronger assertions that the data must be present, and that it's the prior syncronization call taht is fetching the items, and not this potential GetASsetAsync call that is doing it.

@@ -100,7 +91,7 @@ public async ValueTask SynchronizeSolutionAssetsAsync(Checksum solutionChecksum,
}
}

public async ValueTask SynchronizeProjectAssetsAsync(HashSet<Checksum> projectChecksums, CancellationToken cancellationToken)
public async ValueTask SynchronizeProjectAssetsAsync(ProjectStateChecksums projectChecksums, CancellationToken cancellationToken)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made strongly typed.


public async ValueTask SynchronizeAssetsAsync(HashSet<Checksum> checksums, CancellationToken cancellationToken)
{
using (await s_gate.DisposableWaitAsync(cancellationToken).ConfigureAwait(false))
{
await SynchronizeAssets_NoLockAsync(checksums, cancellationToken).ConfigureAwait(false);
// get children of solution checksum object at once
await _assetProvider.SynchronizeAssetsAsync(checksums, cancellationToken).ConfigureAwait(false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inlined method.

foreach (var project in solutionChecksumObject.Projects)
{
var projectStateChecksums = _assetProvider.GetRequiredAsset<ProjectStateChecksums>(project);
await SynchronizeProjectAssetsAsync(projectStateChecksums, cancellationToken).ConfigureAwait(false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all this became much more strongly typed. the solution checksum points at the proejct checksums. from the project checksums we retrieve the project-state-checkum objects. which we can then use to synchronize that project.

i'm trying to make things not just take generic checksums, making it clearer what data needs to be passed to what.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, instead of blocking the entire operation while doing a sync. We can now interleave syncing a project's worth of info at a time.

var oldMap = await GetProjectMapAsync(solution, oldChecksums, cancellationToken).ConfigureAwait(false);
var newMap = await GetProjectMapAsync(_assetProvider, newChecksums, cancellationToken).ConfigureAwait(false);
await PopulateOldProjectMapAsync().ConfigureAwait(false);
await PopulateNewProjectMapAsync(this).ConfigureAwait(false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inlined these helper methods to now be local functions in this main function. moved to pooled dictionaries as well to help with allocation overhead.

var projectStateChecksums = await @this._assetProvider.GetAssetsAsync<ProjectStateChecksums>(news.Object, cancellationToken).ConfigureAwait(false);

foreach (var (_, projectStateChecksum) in projectStateChecksums)
newMap.Add(projectStateChecksum.ProjectId, projectStateChecksum);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is much more efficient than what is is replacing. because the projectStateChecksums exposes a .ProjectId now, we avoid a pointless roundtrip between the server and the host.

}
}

public async ValueTask SynchronizeProjectAssetsAsync(HashSet<Checksum> projectChecksums, CancellationToken cancellationToken)
public async ValueTask SynchronizeProjectAssetsAsync(ProjectStateChecksums projectChecksum, CancellationToken cancellationToken)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more strongly typed. much clearer about what to pass.

}
}

private async ValueTask SynchronizeProjectAssets_NoLockAsync(IReadOnlyCollection<Checksum> projectChecksums, CancellationToken cancellationToken)
private async ValueTask SynchronizeProjectAssets_NoLockAsync(ProjectStateChecksums projectChecksum, CancellationToken cancellationToken)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unlike before, this syncs just one project. the caller of this is reponsible for handling multiple projects if that is relevant.

{
var projectChecksums = await projectState.GetStateChecksumsAsync(cancellationToken).ConfigureAwait(false);
if (olds.Object.Contains(projectChecksums.Checksum))
oldMap.Add(projectId, projectChecksums);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same logic as before, just inlined.

@@ -252,11 +256,11 @@ private async Task<Solution> UpdateProjectAsync(Project project, ProjectStateChe
project = await UpdateDocumentsAsync(
project,
newProjectChecksums,
project.State.DocumentStates.States.Values,
project.State.DocumentStates,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made this method strongly typed, which avoids unnecessary IEnumerable allocs.

(solution, documents) => solution.AddDocuments(documents),
(solution, documentIds) => solution.RemoveDocuments(documentIds),
static (solution, documents) => solution.AddDocuments(documents),
static (solution, documentIds) => solution.RemoveDocuments(documentIds),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made explicit that these do not alloc lambdas.

var documentChecksums = await state.GetStateChecksumsAsync(cancellationToken).ConfigureAwait(false);
if (olds.Object.Contains(documentChecksums.Checksum))
oldMap.Add(state.Id, documentChecksums);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same logic as before. just inliend.

var documentStateChecksums = await @this._assetProvider.GetAssetsAsync<DocumentStateChecksums>(news.Object, cancellationToken).ConfigureAwait(false);

foreach (var (_, documentStateChecksum) in documentStateChecksums)
newMap.Add(documentStateChecksum.DocumentId, documentStateChecksum);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is much more efficient than before. similar to projects, this avoids an unnecessary round trip to determine document ids.

foreach (var documentChecksum in documentChecksums)
infoChecksums.Add(documentChecksum.Item2.Info);

var infos = await assetProvider.GetAssetsAsync<DocumentInfo.DocumentAttributes>(infoChecksums, cancellationToken).ConfigureAwait(false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was the round-trip just for the purpose of getting a doc id.

foreach (var projectChecksum in projectChecksums)
infoChecksums.Add(projectChecksum.Item2.Info);

var infos = await assetProvider.GetAssetsAsync<ProjectInfo.ProjectAttributes>(infoChecksums, cancellationToken).ConfigureAwait(false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a round trip just to get project ids.

checksums.AddIfNotNullChecksum(this.Checksum);
checksums.AddIfNotNullChecksum(this.Info);
checksums.AddIfNotNullChecksum(this.Text);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only used in tests. moved there. will do the same with ProjectStateChecksums in a later PR.

@CyrusNajmabadi CyrusNajmabadi changed the title WIP: More checksum work. Further checksum syncing cleanup work. Oct 12, 2023
@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review October 12, 2023 05:44
@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner October 12, 2023 05:44
// third and last get direct children for all projects and documents in the solution
await SynchronizeProjectAssets_NoLockAsync(solutionChecksumObject.Projects, cancellationToken).ConfigureAwait(false);
// third and last get direct children for all projects and documents in the solution
foreach (var project in solutionChecksumObject.Projects)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lifted the loop out of the SynchronizeProjectAssets_NoLockAsync helper. this makes it easy to sync just a project at a time. note: the next PR ensures that when we go to sync a particular project that the other side can do that work efficiently (without having to scan the entire solution for the data it needs).


async Task PopulateOldDocumentMapAsync()
{
foreach (var (_, state) in existingTextDocumentStates.States)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by passing in TextDocumentStates<T> this becomes a no-alloc enumeration.

@@ -497,84 +520,6 @@ private async Task<TextDocument> UpdateDocumentInfoAsync(TextDocument document,
return document;
}

private static async ValueTask<Dictionary<DocumentId, DocumentStateChecksums>> GetDocumentMapAsync(AssetProvider assetProvider, HashSet<Checksum> documents, CancellationToken cancellationToken)
{
var map = new Dictionary<DocumentId, DocumentStateChecksums>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasteful non-pooled dicitonary.

@ToddGrun
Copy link
Contributor

Made a first pass through (pre-coffee) and it seems to make sense. Will make another pass through post-coffee and see if it still makes sense.

@@ -69,6 +69,7 @@ private async Task<ProjectStateChecksums> ComputeChecksumsAsync(CancellationToke
var analyzerConfigDocumentChecksums = new ChecksumCollection(await analyzerConfigDocumentChecksumTasks.WhenAll().ConfigureAwait(false));

return new ProjectStateChecksums(
this.Id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.Id,

General question about something not changed in this PR:

I see SelectAsArray done above, which I understand gives back an ImmutableArray that allows struct based enumeration. However, that array did require a heap allocation. It seems to me that if you know the collection will only be enumerated once locally (as in this case), it's better to have a heap based enumerator allocated than an ImmutableArray of a potentially large size.

Of course, that assumption could be broken and then you end up doing multiple enumerator allocations, but that doesn't seem too likely and too costly if it does happen. Another option if repeated enumerations is a concern would be an ArrayBuilder based SelectAs method, as that could have the benefit of no-allocation and struct based enumeration.

Or maybe I'm just overthinking this.

private async ValueTask CollectProjectStateChecksumsAsync(HashSet<Checksum> set, IReadOnlyCollection<Checksum> checksums, CancellationToken cancellationToken)
{
foreach (var checksum in checksums)
void CollectChecksumChildren(ChecksumSynchronizer @this, ChecksumCollection collection)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChecksumSynchronizer @this,

Curious why you passed this in

Copy link
Contributor

@ToddGrun ToddGrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@CyrusNajmabadi CyrusNajmabadi merged commit 348a4bf into dotnet:main Oct 12, 2023
24 checks passed
@ghost ghost added this to the Next milestone Oct 12, 2023
@CyrusNajmabadi CyrusNajmabadi deleted the checksumCleanup5 branch October 12, 2023 17:56
@jjonescz jjonescz modified the milestones: Next, 17.9 P1 Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants