Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preview - FeedToken on Queries #1291

Closed
wants to merge 133 commits into from
Closed

Conversation

ealsur
Copy link
Member

@ealsur ealsur commented Mar 20, 2020

FeedToken on Queries

Description

This PR adds support for FeedToken on Queries while at the same time doing a redesign on the FeedToken public APIs.

The intent is to separate FeedToken types for operations to avoid the possibility of mixing tokens used on a Change Feed operation and tokens coming from a Query operation.

The APIs defined at #1210 and #1230 are valid but have slight modifications:

Token types

There are two distinct types: ChangeFeedToken and QueryFeedToken that are used on their specific operations.

Obtaining FeedTokens

There are no two distinct methods to request tokens, depending on the operation:

public Task<IReadOnlyList<ChangeFeedToken>> GetChangeFeedTokensAsync(CancellationToken cancellationToken = default(CancellationToken));

public Task<IReadOnlyList<QueryFeedToken>> GetQueryFeedTokensAsync(
            QueryDefinition queryDefinition,
            QueryRequestOptions queryRequestOptions = null,
            CancellationToken cancellationToken = default(CancellationToken));

The second method, for Queries, can return 1 token or multiple, depending on the nature of the query (for example, queries with DISTINCT do not make sense to be parallelized across multiple machines).

Using FeedTokens

The types are now used as input on their respective operation APIs:

public abstract FeedIterator GetItemQueryStreamIterator(
            QueryFeedToken feedToken,
            QueryRequestOptions requestOptions = null);

public abstract FeedIterator<T> GetItemQueryIterator<T>(
            QueryFeedToken feedToken,
            QueryRequestOptions requestOptions = null);

public abstract ChangeFeedIterator GetChangeFeedStreamIterator(
            ChangeFeedToken feedToken,
            ChangeFeedRequestOptions changeFeedRequestOptions = null);

public abstract ChangeFeedIterator<T> GetChangeFeedIterator(
            ChangeFeedToken feedToken,
            ChangeFeedRequestOptions changeFeedRequestOptions = null);

Serialization

Both token types are serializable and there are checks to avoid accidental mixing of format (for example, serialize a ChangeFeedToken and try to deserialize as QueryFeedToken, and viceversa).

Mixing containers

This PR also addresses another request and that is, to validate that a FeedToken generated for a container cannot be used in another container.

File reorganization

I reorganized FeedToken related files (that's why some files look completely modified) into their own folder, and tests have their own sub-namespace.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Closing issues

Closes #1262
Closes #1228
Closes #1226

}

this.CompleteRange = new Documents.Routing.Range<string>(keyRanges[0].MinInclusive, keyRanges[keyRanges.Count - 1].MaxExclusive, true, false);
foreach (Documents.PartitionKeyRange keyRange in keyRanges)
this.CompleteRange = new Documents.Routing.Range<string>(ranges[0].Min, ranges[ranges.Count - 1].Max, true, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't know whether the ranges are sorted or not. When my epkrange PR goes through you can use the PartitionedSortedDistinct ranges class to uphold this invariant.

using Newtonsoft.Json;

[JsonConverter(typeof(FeedTokenInternalConverter))]
internal sealed class FeedTokenPartitionKey : FeedToken, IChangeFeedToken
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be PartitionKeyFeedToken instead? similar to MemoryStream vs Stream

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mainly using FeedToken as the prefix for all the related files/classes. Like FeedTokenEPKRange as in "a FeedToken for a EPK Range", or "a FeedToken for a PartitionKey".

/// </summary>
internal enum FeedTokenType
{
EPKRange = 0,
PartitionKeyValue = 1,
PartitionKeyValue = 1, // Used for PK-based Change Feed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this enum need to be explicit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always try to be explicit, but I don't have strong feelings

/// </summary>
[Serializable]
#if PREVIEW
public
#else
internal
#endif
abstract class FeedToken
abstract class QueryFeedToken
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to inherit from FeedToken?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. The class hierarchy is:

ChangeFeedToken and QueryFeedToken are the public abstract types
ChangeFeedTokenInternal and QueryFeedTokenInternal are the internal implementations

In order to reuse the actual FeedToken types (EPK, PKRangeId,...) I could had either duplicated the code or, as I did, use interfaces. So the EPK FeedToken can be wrapped/used through a ChangeFeedToken or through a QueryFeedToken leveraging the Internal interfaces.

[JsonConverter(typeof(QueryFeedTokenInternalConverter))]
internal sealed class QueryFeedTokenInternal : QueryFeedToken
{
public readonly IQueryFeedToken QueryFeedToken;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between IQueryFeedToken and QueryFeedToken?

Copy link
Member Author

@ealsur ealsur Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryFeedToken is the public abstract contract.

IQueryFeedToken is the interface that lets me reuse a EPK FeedToken implementation in a QueryFeedToken.

EPK FeedToken implements 2 interfaces IChangeFeedToken and IQueryFeedToken.

QueryFeedTokenInternal accesses IQueryFeedToken, and ChangeFeedTokenInternal accesses IChangeFeedToken. That way, the same EPK FeedToken implementation can be reused without any code duplication.

@@ -70,5 +70,7 @@ public void Stop()
}

public abstract CosmosElement GetCosmosElementContinuationToken();

public abstract bool TryGetFeedToken(string containerResourceId, SqlQuerySpec sqlQuerySpec, out QueryFeedToken feedToken);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird that you are taking rid and queryspec as input for this method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for that is that I need to create a QueryFeedToken on the ExecutionContext after the results.

To create a QueryFeedToken I need the ContainerRid, and which is the query that executed. I couldn't find those pieces of information in the Execution Context or the Components.

@@ -66,6 +66,33 @@ protected override string ContinuationToken
}
}

/// <summary>
/// Gets the FeedToken for an order by query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to return a FeedToken for ORDER BY queries? What if the user want's to scale the token? The results wouldn't be mergeable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is why we always return 1. It cannot scale.

@@ -28,7 +28,7 @@ internal static class QueryPlanRetriever
| QueryFeatures.Top
| QueryFeatures.NonValueAggregate;

private static readonly string SupportedQueryFeaturesString = SupportedQueryFeatures.ToString();
internal static readonly string SupportedQueryFeaturesString = SupportedQueryFeatures.ToString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I am leveraging it on GetQueryFeedTokens to understand what type of QueryDefinition the user wants. I use the QueryExecutionPlan to see if it has any Aggregate/Distinct/etc and instead of returning multiple QueryFeedTokens for each PKRangeId, I return a single one signaling that that type of query cannot be scaled.

Copy link
Contributor

@bchong95 bchong95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change. Excited to see customers start to use this API.

@ealsur
Copy link
Member Author

ealsur commented Apr 3, 2020

Closing due to different approach being developed (feature still be delivered).

@ealsur ealsur closed this Apr 3, 2020
@ealsur ealsur deleted the users/ealsur/feedtokenquery branch February 26, 2021 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FeedToken - Interoperability FeedToken - validate Container FeedToken - GetPartitionKeyRanges
3 participants