Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Fixes SplitHandling bug caused by caches not getting refreshed #2004

Merged
merged 5 commits into from
Nov 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -130,9 +130,15 @@ public async ValueTask<bool> MoveNextAsync()
if (IsSplitException(exception))
{
// Handle split
await this.feedRangeProvider.MonadicRefreshProviderAsync(this.cancellationToken);
IEnumerable<FeedRangeInternal> childRanges = await this.feedRangeProvider.GetChildRangeAsync(
currentPaginator.Range,
cancellationToken: this.cancellationToken);
if (childRanges.Count() <= 1)
{
throw new InvalidOperationException("Expected more than 1 child");
}

foreach (FeedRangeInternal childRange in childRanges)
{
PartitionRangePageAsyncEnumerator<TPage, TState> childPaginator = this.createPartitionRangeEnumerator(
Expand Down
7 changes: 7 additions & 0 deletions Microsoft.Azure.Cosmos/src/Pagination/DocumentContainer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,13 @@ public Task<List<FeedRangeEpk>> GetFeedRangesAsync(
cancellationToken),
cancellationToken);

public Task RefreshProviderAsync(CancellationToken cancellationToken) => TryCatch.UnsafeWaitAsync(
this.MonadicRefreshProviderAsync(cancellationToken),
cancellationToken);

public Task<TryCatch> MonadicRefreshProviderAsync(CancellationToken cancellationToken) => this.monadicDocumentContainer.MonadicRefreshProviderAsync(
cancellationToken);

public Task<TryCatch<Record>> MonadicCreateItemAsync(
CosmosObject payload,
CancellationToken cancellationToken) => this.monadicDocumentContainer.MonadicCreateItemAsync(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ namespace Microsoft.Azure.Cosmos.Pagination
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Azure.Documents;

internal interface IFeedRangeProvider : IMonadicFeedRangeProvider
{
Expand All @@ -17,5 +16,7 @@ Task<List<FeedRangeEpk>> GetChildRangeAsync(

Task<List<FeedRangeEpk>> GetFeedRangesAsync(
CancellationToken cancellationToken);

Task RefreshProviderAsync(CancellationToken cancellationToken);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,7 @@ Task<TryCatch<List<FeedRangeEpk>>> MonadicGetChildRangeAsync(

Task<TryCatch<List<FeedRangeEpk>>> MonadicGetFeedRangesAsync(
CancellationToken cancellationToken);

Task<TryCatch> MonadicRefreshProviderAsync(CancellationToken cancellationToken);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,8 @@ public async Task<TryCatch<List<FeedRangeEpk>>> MonadicGetChildRangeAsync(
this.container.LinkUri,
await this.container.GetRIDAsync(cancellationToken),
containerProperties.PartitionKey,
feedRange);
feedRange,
forceRefresh: false);
return TryCatch<List<FeedRangeEpk>>.FromResult(
overlappingRanges.Select(range => new FeedRangeEpk(
new Documents.Routing.Range<string>(
Expand All @@ -117,6 +118,26 @@ await this.container.GetRIDAsync(cancellationToken),
}
}

public async Task<TryCatch> MonadicRefreshProviderAsync(CancellationToken cancellationToken)
{
cancellationToken.ThrowIfCancellationRequested();

try
{
// We can refresh the cache by just getting all the ranges for this container using the force refresh flag
_ = await this.cosmosQueryClient.TryGetOverlappingRangesAsync(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is in the CosmosQueryClient API, can we add a UT that tests this method is called with forceRefresh: true as a Mock.Verify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No amount of unit testing is going to solve this scenario. I did a manual integration test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. You are adding an API that has the expectation to call the cosmosQueryClient.TryGetOverlappingRangesAsync with the forceRefresh flag as true. You can cover this with a UT that asserts the behavior and makes sure that expectation doesn't regress. That is the point of a unit test. And there is nothing blocking you from adding this to secure the behavior, all involved types are mockable.

So if it can be done, and there is value in securing the behavior, why are we valuing engineering time over code quality/coverage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I said. The mindset that we can just keep adding unit tests to secure this code path is wrong. The fact that we are making this fix is evident of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that if when the code was first introduced, we would have exercised the validation of the assumption that cache refreshes should happen on a split, then we probably have found the bug/missing path earlier because we would've seen that no cache refreshes were happening.
The goal of UTs is not to add them after the fact, but to set the expectations first and see if the scenario works.

This bug either means that we never set the expectations or we didn't want to cover them.

Hence my ask of, can we now add a UT to validate the expectation to avoid a future regression?

Copy link

@rmandvikar rmandvikar Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchong95 $2c. i see that this PR is merged in now. i definitely agree with @ealsur about a UT that asserts the behavior to prevent a regression in future (especially since this caused a PRD issue on our side).

Copy link
Contributor Author

@bchong95 bchong95 Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ealsur The problem with unit tests is that you have know to add them during the testing phase. The problem with this code is that is has independently moving components with soft contracts. One will never know of all the cases to cover. The standard approach is that you first add an integration test to discover what soft contracts need to be tested for and then you go back to add unit tests, since they run faster and are easier to debug.

@rmandvikar unit tests are not the way to go for this code path. Mocking out a bunch of assumptions for soft contracts is fragile and they are bound to break as the system evolves. Maintaining them is basically a lifestyle. If I had a mock / unit test I would have to update it for this PR:

#2010

Since the soft contract we are testing for is "only refresh when needed" instead of "refresh after every split". If an independent developer made the optimization and failed my random unit test they would have to sit there and go through git history to figure out the what the original soft contract was and if it's okay for them to update the test to reflect the new soft contract.

Here is the PR for the proper way of stopping regressions:

#2026

It simulates the caching behavior that prod sees and asserts the soft contract without the need for manual unit tests that will break in the future.

Again an integration test would catch all these situations plus other soft contract violations we have yet to up with.

this.container.LinkUri,
FeedRangeEpk.FullRange.Range,
forceRefresh: true);

return TryCatch.FromResult();
}
catch (Exception ex)
{
return TryCatch.FromException(ex);
}
}

public async Task<TryCatch<ReadFeedPage>> MonadicReadFeedAsync(
ReadFeedState readFeedState,
FeedRangeInternal feedRange,
Expand All @@ -138,7 +159,8 @@ public async Task<TryCatch<ReadFeedPage>> MonadicReadFeedAsync(
this.container.LinkUri,
await this.container.GetRIDAsync(cancellationToken),
containerProperties.PartitionKey,
feedRange);
feedRange,
forceRefresh: false);

if ((overlappingRanges == null) || (overlappingRanges.Count != 1))
{
Expand Down Expand Up @@ -249,7 +271,8 @@ public async Task<TryCatch<QueryPage>> MonadicQueryAsync(
this.container.LinkUri,
await this.container.GetRIDAsync(cancellationToken),
containerProperties.PartitionKey,
feedRange);
feedRange,
forceRefresh: false);
}

queryRequestOptions.PartitionKey = feedRangePartitionKey.PartitionKey;
Expand Down Expand Up @@ -301,7 +324,8 @@ await this.container.GetRIDAsync(cancellationToken),
this.container.LinkUri,
await this.container.GetRIDAsync(cancellationToken),
containerProperties.PartitionKey,
feedRange);
feedRange,
forceRefresh: false);

if ((overlappingRanges == null) || (overlappingRanges.Count != 1))
{
Expand Down Expand Up @@ -359,7 +383,8 @@ public async Task<TryCatch<ChangeFeedPage>> MonadicChangeFeedAsync(
this.container.LinkUri,
await this.container.GetRIDAsync(cancellationToken),
containerProperties.PartitionKey,
feedRange);
feedRange,
forceRefresh: false);

if ((overlappingRanges == null) || (overlappingRanges.Count != 1))
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,8 @@ private static async Task<TryCatch<IQueryPipelineStage>> TryCreateCoreContextAsy
List<Documents.PartitionKeyRange> targetRanges = await cosmosQueryContext.QueryClient.GetTargetPartitionKeyRangesByEpkStringAsync(
cosmosQueryContext.ResourceLink,
containerQueryProperties.ResourceId,
inputParameters.PartitionKey.Value.InternalKey.GetEffectivePartitionKeyString(partitionKeyDefinition));
inputParameters.PartitionKey.Value.InternalKey.GetEffectivePartitionKeyString(partitionKeyDefinition),
forceRefresh: false);

return CosmosQueryExecutionContextFactory.TryCreatePassthroughQueryExecutionContext(
documentContainer,
Expand Down Expand Up @@ -427,29 +428,33 @@ private static TryCatch<IQueryPipelineStage> TryCreateSpecializedDocumentQueryEx
targetRanges = await queryClient.GetTargetPartitionKeyRangesByEpkStringAsync(
resourceLink,
containerQueryProperties.ResourceId,
containerQueryProperties.EffectivePartitionKeyString);
containerQueryProperties.EffectivePartitionKeyString,
forceRefresh: false);
}
else if (TryGetEpkProperty(properties, out string effectivePartitionKeyString))
{
targetRanges = await queryClient.GetTargetPartitionKeyRangesByEpkStringAsync(
resourceLink,
containerQueryProperties.ResourceId,
effectivePartitionKeyString);
effectivePartitionKeyString,
forceRefresh: false);
}
else if (feedRangeInternal != null)
{
targetRanges = await queryClient.GetTargetPartitionKeyRangeByFeedRangeAsync(
resourceLink,
containerQueryProperties.ResourceId,
containerQueryProperties.PartitionKeyDefinition,
feedRangeInternal);
feedRangeInternal,
forceRefresh: false);
}
else
{
targetRanges = await queryClient.GetTargetPartitionKeyRangesAsync(
resourceLink,
containerQueryProperties.ResourceId,
partitionedQueryExecutionInfo.QueryRanges);
partitionedQueryExecutionInfo.QueryRanges,
forceRefresh: false);
}

return targetRanges;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -278,9 +278,15 @@ private async ValueTask<bool> MoveNextAsync_InitializeAsync_HandleSplitAsync(
{
this.cancellationToken.ThrowIfCancellationRequested();

await this.documentContainer.RefreshProviderAsync(this.cancellationToken);
IEnumerable<FeedRangeInternal> childRanges = await this.documentContainer.GetChildRangeAsync(
uninitializedEnumerator.Range,
cancellationToken: this.cancellationToken);
if (childRanges.Count() <= 1)
{
throw new InvalidOperationException("Expected more than 1 child");
}

foreach (FeedRangeInternal childRange in childRanges)
{
this.cancellationToken.ThrowIfCancellationRequested();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,18 +73,21 @@ public abstract Task<PartitionedQueryExecutionInfo> ExecuteQueryPlanRequestAsync
public abstract Task<List<Documents.PartitionKeyRange>> GetTargetPartitionKeyRangesByEpkStringAsync(
string resourceLink,
string collectionResourceId,
string effectivePartitionKeyString);
string effectivePartitionKeyString,
bool forceRefresh);

public abstract Task<List<Documents.PartitionKeyRange>> GetTargetPartitionKeyRangeByFeedRangeAsync(
string resourceLink,
string collectionResourceId,
Documents.PartitionKeyDefinition partitionKeyDefinition,
FeedRangeInternal feedRangeInternal);
FeedRangeInternal feedRangeInternal,
bool forceRefresh);

public abstract Task<List<Documents.PartitionKeyRange>> GetTargetPartitionKeyRangesAsync(
string resourceLink,
string collectionResourceId,
List<Documents.Routing.Range<string>> providedRanges);
List<Documents.Routing.Range<string>> providedRanges,
bool forceRefresh);

public abstract bool ByPassQueryParsing();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -186,36 +186,41 @@ public override async Task<PartitionedQueryExecutionInfo> ExecuteQueryPlanReques
public override Task<List<PartitionKeyRange>> GetTargetPartitionKeyRangesByEpkStringAsync(
string resourceLink,
string collectionResourceId,
string effectivePartitionKeyString)
string effectivePartitionKeyString,
bool forceRefresh)
{
return this.GetTargetPartitionKeyRangesAsync(
resourceLink,
collectionResourceId,
new List<Range<string>>
{
Range<string>.GetPointRange(effectivePartitionKeyString)
});
},
forceRefresh);
}

public override async Task<List<PartitionKeyRange>> GetTargetPartitionKeyRangeByFeedRangeAsync(
string resourceLink,
string collectionResourceId,
PartitionKeyDefinition partitionKeyDefinition,
FeedRangeInternal feedRangeInternal)
FeedRangeInternal feedRangeInternal,
bool forceRefresh)
{
IRoutingMapProvider routingMapProvider = await this.GetRoutingMapProviderAsync();
List<Range<string>> ranges = await feedRangeInternal.GetEffectiveRangesAsync(routingMapProvider, collectionResourceId, partitionKeyDefinition);

return await this.GetTargetPartitionKeyRangesAsync(
resourceLink,
collectionResourceId,
ranges);
ranges,
forceRefresh);
}

public override async Task<List<PartitionKeyRange>> GetTargetPartitionKeyRangesAsync(
string resourceLink,
string collectionResourceId,
List<Range<string>> providedRanges)
List<Range<string>> providedRanges,
bool forceRefresh)
{
if (string.IsNullOrEmpty(collectionResourceId))
{
Expand All @@ -231,7 +236,7 @@ public override async Task<List<PartitionKeyRange>> GetTargetPartitionKeyRangesA

IRoutingMapProvider routingMapProvider = await this.GetRoutingMapProviderAsync();

List<PartitionKeyRange> ranges = await routingMapProvider.TryGetOverlappingRangesAsync(collectionResourceId, providedRanges);
List<PartitionKeyRange> ranges = await routingMapProvider.TryGetOverlappingRangesAsync(collectionResourceId, providedRanges, forceRefresh);
if (ranges == null && PathsHelper.IsNameBased(resourceLink))
{
// Refresh the cache and don't try to re-resolve collection as it is not clear what already
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,9 @@ public Task<TryCatch<List<FeedRangeEpk>>> MonadicGetFeedRangesAsync(
CancellationToken cancellationToken) => this.documentContainer.MonadicGetFeedRangesAsync(
cancellationToken);

public Task<TryCatch> MonadicRefreshProviderAsync(
CancellationToken cancellationToken) => this.documentContainer.MonadicRefreshProviderAsync(cancellationToken);

public Task<TryCatch<string>> MonadicGetResourceIdentifierAsync(
CancellationToken cancellationToken) => this.documentContainer.MonadicGetResourceIdentifierAsync(cancellationToken);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,13 @@ FeedRangeEpk CreateRangeFromId(int id)
return TryCatch<List<FeedRangeEpk>>.FromResult(overlappingRanges);
}

public Task<TryCatch> MonadicRefreshProviderAsync(CancellationToken cancellationToken)
{
// The feedrangeprovider is always insync in memory
// so we can no op for this one
return Task.FromResult(TryCatch.FromResult());
bchong95 marked this conversation as resolved.
Show resolved Hide resolved
}

public Task<TryCatch<Record>> MonadicCreateItemAsync(
CosmosObject payload,
CancellationToken cancellationToken)
Expand Down