Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend LeaseCollection allowed partition key paths to "\partitionKey" to support gremlin accounts #158

Merged
merged 18 commits into from
May 3, 2021

Conversation

CPrashanth
Copy link
Contributor

@CPrashanth CPrashanth commented Mar 18, 2021

scenario-

  1. create a gremlin cosmosdb account.
  2. Cosmosdb exposes both gremlin and sql api for the account.
  3. create a collection and read/write data using sql api. Note not using graph api.
  4. setup changefeedprocessor library for sql collection created in step 3

Expected: Just like the following works for a collection with sql api on gremlin account, expectation is changefeedprocessor also works and doesnt special case gremlin accounts which the below apis dont do.

  1. Gets/Put using sql Api
  2. ChangeFeed Pull Api
  3. ChangeFeed Azure Functions

Actual: ChangeFeedProcessor requires LeaseCollection to be created with "id" PartitionKey which is not supported on Gremlin Accounts. We get the following error when the lease collection is created. "“Microsoft.Azure.Documents.DocumentClientException: Partition key path /id is invalid for Gremlin API. The path cannot be '/id', '/label' or a nested path such as '/key/path'.""

Fix:

  • Allow lease collection with path partitionKey and id
  • Create another property partitionKey in the DocumentServiceLease class
  • by default, don't serialize or fill in the property value.
  • only if the lease collection partition key path is "partitionKey", fill in the property value.
  • the above is achieved by creating another RequestOptionsFactory class for partitionKey, which stamps the lease object with the property value

Tested:
Ran UT, IT locally with cosmosdb emulator in gremlin mode.
ran ut, it against azure sql account and azure graph account and verified the cases passed

Closes #156

@ghost
Copy link

ghost commented Mar 18, 2021

CLA assistant check
All CLA requirements met.

@CPrashanth
Copy link
Contributor Author

CPrashanth commented Mar 18, 2021

Fix the below issue
#156

/// Gets or sets property to be used as partition key path for lease collections created in GremlinAccounts
/// </summary>
[JsonProperty("gremlincompatid")]
public string GremlinCompatId
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user has a property with this name? Wouldn't be better for the user to specify the pk they want to use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the problem you are specifying, exists even today before this PR. Default implementation assumes it can be only "id"

This is to address the default scenario where end user is using default lease management from this library.
In this case, property names are controlled by this class and there would be no conflict.
In their custom lease management solution, it doesnt hit this path as I understand

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, what I'm saying is that this fails with the same assumption that currently happens with /id. Giving the user the choice would be better suited.

Copy link
Contributor Author

@CPrashanth CPrashanth Mar 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me give an example and you can tell me how it could work so that I can understand the proposal better
lets say I choose FooId as my partitionKey for the lease collection and created the collection. now when I pass it down to this library, what should the library do to the DocumentServiceLease class so that it can stamp the objects with FooId property.

Will look at your reply for above example, but I feel that what you are proposing would not be possible as long as lease implementation and its class definition is controlled by this library.
Unless you are talking reflection and dynamically adding properties. I am not sure whether we want that kind of complexity, only reason customers complain about id as current partition property is because it doesnt work on gremlin.

Ideally this library should create the lease collection since it controls definition of DocumentServiceLease and stores it in the lease collection. That way customer doesn't have to deal with identifying partitionkeyPath looking at the internal implementation of this class.

@CPrashanth CPrashanth marked this pull request as ready for review March 18, 2021 21:08
Copy link
Contributor

@mkolt mkolt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please provide better description: explain the problem and outline the design (and why this is going to work).

@CPrashanth CPrashanth requested a review from mkolt March 18, 2021 22:20
@CPrashanth CPrashanth marked this pull request as draft March 19, 2021 00:44
@CPrashanth CPrashanth marked this pull request as ready for review March 19, 2021 05:14
@mkolt
Copy link
Contributor

mkolt commented Mar 19, 2021

The description is still not clear. Can we have description that OSS user with minimum knowledge of CFP can reason about? Besides, scenario is not clear as well.

  • why does scenario has "create a sql collection and write data into it (Cosmosdb allows it)"? How is this related to the scenario?
  • there is no step 35.
  • "since changefeedprocessor library uses "id" as partition key" -- CFP library doesn't use partition keys. PKs are provided by either monitoring collection or lease collection.
  • CFP requires lease collection, if partitioned, to have PK as /id. If this is the issue you are trying to address? Why is this a problem for reading change feed when monitored collection is gremlin?
  • create alias property GremlinCompatId for Id field. Allow gremlingcompatid in addition to id field as partition key path. Where is that getting created and how does it help? What is the workflow when this is used?

Also for CFP we should not special case gremlin. If we do something, it should be generic and extensible.

/// Gets or sets property to be used as partition key path for lease collections created in GremlinAccounts
/// </summary>
[JsonProperty(GremlinCompatIdPropertyName)]
public string GremlinCompatId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not do that by default. Majority of lease collections do not need this overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you suggest, what has to be done. I tried to interpret this comment below but not sure whether I got it right?

I can change gremlincompatId into LeaseId property name so that its not specific to gremlin and hopefully LeaseId doesnt impact any of other types of cosmosdb account. if you have a way where we can change Id into LeaseId thats ideal considering already collections are provisioned with partitionKey as "Id".

When you are saying its overhead, what's the concern. Is it processing cost, storage cost? clone of Id field is not costly right.
Are you saying we should create a derived class DocumentServiceLeaseV2 and add this property LeaseId there. That would be major refactoring to have parallel classes or if else within existing classes e.g DocumentServiceLeaseStoreManager.cs to conditionally create DocumentServiceLease vs DocumentServiceLeaseV2 based on user input. Overhead cost of additional string is cheaper to this entire effort.

Copy link
Contributor

@mkolt mkolt Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way I think about is using something like (existing) IRequestOptionsFactory, that would use something like Lease.PartitionKey or Lease.AlternatePartitionKey and all operations would append PK to request options and update lease doc for lease operations. That would only be used when CFP Builder explicitly opts-in for using alternate PK for lease collection. In fact, if we go this way, it should not be much more work than just supporting any property used as PK when lease collection was created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it something along this line, I will fix the names later
internal class PartitionedByAlternatePKCollectionRequestOptionsFactory : IRequestOptionsFactory
{
public RequestOptions CreateRequestOptions(ILease lease) => new RequestOptions { PartitionKey = new PartitionKey(lease.AlternatePK) };

    public FeedOptions CreateFeedOptions() => new FeedOptions { EnableCrossPartitionQuery = true };
}

then instantiate that class in CFBuilder based on some flag.

wont this mean the following:

  1. ILease interface has to be updated with a property name "AlternatePK"
  2. if somebody implemented their custom lease management, they need to unnecessarily add this property. Is It Ok?
  3. the moment I touch ILease, i need to update DocumentServiceLease to have that property. so whether I use new RequestOptionsFactory, lease document will still contain the "AlternatePK". so what does it help in?
  4. if somebody created a lease collection with "FooId" as partitionkey. Still not clear, how I can make it work without touching ILease interface to have "FooId" as a property name. Or is this not a ask at all from you and Matias?

Copy link
Contributor

@mkolt mkolt Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ILease doesn't need to be touched. DocumentServiceLease is only specific to CosmosDB service, e.g. redis or in memory lease implementation wouldn't have partition key concept. I think with these changes we can add properties to document service lease either by adding to the class itself or by calling SetProerty via NewtonsoftJson as you did in your ealier changes. I would prefer the latter approach with use controlling what PK they want to use (in the upper stack we would call API to get PK definition from the lease collection at startup).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check this pr - https://github.com/Azure/azure-documentdb-changefeedprocessor-dotnet/pull/159/files
and see if that aligns with what you are proposing
this is rough change and ignore the test cases, I haven't fixed or authored new ones yet

@CPrashanth CPrashanth requested review from ealsur and mkolt March 23, 2021 20:16
/// This property name is compatible to both GremlinAccounts and SqlAccounts
/// </summary>
[JsonProperty(LeasePartitionKeyPropertyName)]
public string LeasePartitionKey { get; set; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name it as just PartitionKey. This class is already a lease.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -14,6 +14,9 @@ namespace Microsoft.Azure.Documents.ChangeFeedProcessor.LeaseManagement
[Serializable]
internal class DocumentServiceLease : ILease, ILeaseAcquireReasonProvider
{
internal const string IdPropertyName = "id";
internal const string LeasePartitionKeyPropertyName = "leasepk";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already a lease. Can we have this as "pk" or "partitionKey", I would prefer the latter as it more clear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

(collection.PartitionKey.Paths.Count != 1 || collection.PartitionKey.Paths[0] != "/id"))

if (isPartitioned &&
(collection.PartitionKey.Paths.Count != 1 || !(collection.PartitionKey.Paths[0].Equals($"/{DocumentServiceLease.IdPropertyName}", StringComparison.OrdinalIgnoreCase) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please break too long lines. Normally 120 chars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

{
throw new ArgumentException("The lease collection, if partitioned, must have partition key equal to id.");
throw new ArgumentException($"The lease collection, if partitioned, must have partition key equal to {DocumentServiceLease.IdPropertyName} or {DocumentServiceLease.LeasePartitionKeyPropertyName}.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more correct to say /id or /partitionKey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


if (isPartitioned &&
(collection.PartitionKey.Paths.Count != 1 || !(collection.PartitionKey.Paths[0].Equals($"/{DocumentServiceLease.IdPropertyName}", StringComparison.OrdinalIgnoreCase) ||
collection.PartitionKey.Paths[0].Equals($"/{DocumentServiceLease.LeasePartitionKeyPropertyName}", StringComparison.OrdinalIgnoreCase))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't use OrdinalIgnoreCase. JSON is case-sensitive. Just compare using "!=" . If using .Equals also check that Paths[0] is not null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// we allow only id or leasePk partitioning so check only flag
requestOptionsFactory = collection.PartitionKey.Paths[0].Equals($"/{DocumentServiceLease.IdPropertyName}", StringComparison.OrdinalIgnoreCase) ?
(IRequestOptionsFactory)new PartitionedByIdCollectionRequestOptionsFactory() :
(IRequestOptionsFactory)new PartitionedByLeasePkCollectionRequestOptionsFactory();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think cast to interface is needed. Is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gives cs8370 error. conditional target type expression available in c# 9.0

var documentServiceLease = lease as DocumentServiceLease;
if (documentServiceLease == null)
throw new ArgumentException("lease is not of type DocumentServiceLease");
if (string.IsNullOrWhiteSpace(documentServiceLease.LeasePartitionKey))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is technically OK to have as we control setting PK value in the lease document. But in general null/empty PK value is valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gets, delete of lease were failing if PK value is null. Searches were working fine

using Microsoft.Azure.Documents.Client;

/// <summary>
/// Used to create request options for partitioned lease collections, when partition key is defined as /leasepk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/leasepk: please don't forget to adjust this to latest version of what we decided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Prashanth Chandrasekaran added 2 commits April 26, 2021 22:24
if (isPartitioned)
{
// we allow only id or leasePk partitioning so check only flag
requestOptionsFactory = collection.PartitionKey.Paths[0] != PartitionKeyPkPathName ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: better to check for == rather than !=. This will avoid extra churn in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/// This is clone of existing Id property to maintain backward compat.
/// This property name is compatible to both GremlinAccounts and SqlAccounts
/// </summary>
[JsonProperty(LeasePartitionKeyPropertyName, NullValueHandling= NullValueHandling.Ignore)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spaces around '='

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public class LeasePkLeaseCollectionTests : StaticCollectionTests
{
public LeasePkLeaseCollectionTests() :
base(true,true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after comma

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mkolt
Copy link
Contributor

mkolt commented Apr 27, 2021

@ealsur can you please review last iteration?

@ealsur
Copy link
Member

ealsur commented Apr 28, 2021

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@ealsur
Copy link
Member

ealsur commented Apr 28, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@ealsur
Copy link
Member

ealsur commented Apr 28, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@CPrashanth CPrashanth closed this Apr 30, 2021
@CPrashanth CPrashanth reopened this Apr 30, 2021
@ealsur
Copy link
Member

ealsur commented May 1, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@CPrashanth CPrashanth changed the title support gremlin accounts Extend LeaseCollection allowed partition key paths to "\partitionKey" to support gremlin accounts May 3, 2021
@mkolt mkolt merged commit f43e6b3 into Azure:master May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot create a lease container when the database API is Gremlin
4 participants