DurableTask.AzureStorage API to enumerate instances #187

TsuyoshiUshio · 2018-05-21T19:34:37Z

I open this pull request to start discussion the details.

I have two topic to discuss.

How many attribute do we need?

In the OrchestrationState has a lot of members. Which attribute do we need for this purpose?

Style of Query

We'll full scan search for the instance table. Is it OK? or do we have any other clever way to fetch the on the fly instances? Currenty, I just return all record of the isntance table.

This pull request is correspond to this issue.
Azure/azure-functions-durable-extension#316

cgillum

Thank you for this PR! I have some initial feedback.

To answer your second question, I think a full scan is okay. Even if we try to filter to just active instances, the table storage performance will be the same since Azure Tables do not support secondary indexes. If you want to do filtering to save CPU on the customer's worker VM, I suggest adding another parameter which allows for flexible filtering in the table storage query.

cgillum · 2018-05-21T22:15:57Z

src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs

+                foreach(DynamicTableEntity entity in instanceTableEntities)
+                {
+                    var instance = new OrchestrationInstance();
+                    instance.InstanceId = entity.Properties["PartitionKey"].ToString();


Take a look at the previous GetStateAsync method above. Since it already has logic to convert table storage rows into OrchestrationState objects, I think it would be best to reuse that code as much as possible. That way if we need to change the logic for reading from table storage, we can do it in one place instead of two places.

Thanks! I’ll fix that!

Hello @cgillum ,

Will it make sense to add additional partition in the Instance table with partition key "OnTheFly" (and row key the instance id) that will contain all the instances currently on the fly. Each of the instance will have also dedicated row with its instance id as partition key as well. We can have a setting that will enable the duplicate update of the instance in the "OnTheFly" partition and its own partition. When the instance completes we will delete the row in the "OnTheFly" partition. This will be one more call but we will know both the partition and the row key.

Then the query for retrieving the on the fly instances will be straightforward. I am worried that the Instnaces table will grow over time and a query that will make table scan each time will have bad performance impact.

Thank you!

Good point! lets discuss that. I also worry about the performance since storage table only have a key for partition key and the instance table might be growing. :)

Hi @gled4er , One question. OrchestrationInstanceStatus might be a domain class of instance table right? Then which attribute is the instanceId? I guess PartitionKey. Is it right?

gled4er · 2018-05-23T01:25:36Z

Hi, let's discuss today in person.

cgillum · 2018-05-23T08:46:37Z

I'm a little concerned about creating a new partition for "on the fly" for several reasons. I'm not sure if I know how to communicate it clearly. The high-level issue is that it pollutes the data in the instances table. It also creates complexity in the data model (and I prefer to avoid complexity whenever possible). In general, data in the partition column should be as balanced as possible, and "on-the-fly" partitions will create a lot of imbalance and will eventually result in different kinds of performance issues. The design is also specific to a particular use case and may not add value to other use cases.

My thought is that we should keep this feature simple and not change the structure of the data. Yes there will be performance issues with large numbers of rows - however, i think this is unavoidable when using Azure Table storage, so it's better if we just document it. For many users, the simple implementation will still be very valuable. If there is a need for huge numbers of instances, then we should recommend the Event Grid approach to put the data into a different, optimized data store based on the customer's specific needs (maybe they will use Cosmos DB, or something else).

gled4er · 2018-05-23T09:34:41Z

Hello @cgillum,

Thank you very much for the explanation.

I agree.

Thank you!

TsuyoshiUshio · 2018-05-24T07:03:02Z

@cgillum , @gled4er I have one suggestion. How about adding some configuration to remove the old instances. e.g. 1day or 7 days or something on the host.json. Then we can remove the old record which is already done. Then the number of the record might be limited. Also you don't need to add new column. Just asynchronously remove the record which is not used.

TsuyoshiUshio · 2018-05-24T07:10:32Z

Hi @cgillum One more thing. Could you enable CI for this pull request? I refactor the code to keep DRY. I just did extract method. however, it fails the test. When I see the test code, it seems related timeout. Which seems flakiness. I'd like to see what happens.

cgillum · 2018-05-24T07:34:10Z

This repo unfortunately does not have a CI enabled, so we have to rely on local testing. :( Hopefully this can be fixed in the near future - but I'll need to talk to the repo owners about it.

Regarding the suggestion for removing old instances, let's consider it as a separate feature. We have a GitHub issue tracking something like this: Azure/azure-functions-durable-extension#17

TsuyoshiUshio · 2018-05-24T07:47:00Z

@cgillum Make sense! Sure.

TsuyoshiUshio · 2018-05-26T08:58:59Z

Hi @cgillum and @gled4er ,

I have a question for testing. I add the query method for AzureTableTrackingStore.cs and AzureStorageOrchestrationService, and TaskHubClient and IOrchestrationServiceClient. Except for the first one, all of others are just delegation.

I might need to write test on the AzureStorageScenarioTests.cs with TestOrchestraionClient.
Since this method fetch all instances from instance table, the test will be like this.

Clear the instance table
Insert data ( or start Orchestrator)
Fetch the all data and compare the collect number and contents.

However the tests are executed parallely, If I clear the instance table, it might cause a problem. In this case how do you test the method? Do you have similar example on your test?

cgillum · 2018-05-27T05:23:59Z

I don't think the tests are executed in parallel because all of them clear the task hub in the very beginning. Is that not what you're observing?

cgillum

This is looking good. I like how it is simple and easy to understand. One concern I have is about changing DurableTask.Core. I recommend we try not to change it yet and try to test the client-side by calling AzureStorageOrchestrationService.GetStateAsync directly. That way we don't need to convince the core DTFx team about adding new public APIs that need to be implemented by all providers.

cgillum · 2018-05-27T05:25:48Z

src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs

+
+                // TODO do we need these?
+                this.stats.StorageRequests.Increment();
+                this.stats.TableEntitiesRead.Increment(instancestatuses.Count - previousCount);


Regarding the TODO: Yes, we need these. We track these statistics to know how much load we're putting on the Azure Storage account.

cgillum · 2018-05-27T05:27:25Z

src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs

+            var query = new TableQuery<OrchestrationInstanceStatus>();
+            TableContinuationToken token = null;
+
+            var instancestatuses = new List<OrchestrationInstanceStatus>(100);


nit: capitalize the "S" - instanceStatuses.

Done. However, it eventually removed. lol

cgillum · 2018-05-27T05:33:20Z

src/DurableTask.Core/IOrchestrationServiceClient.cs

+        /// Get states of the all orchestration instances
+        /// </summary>
+        /// <returns>List of <see cref="OrchestrationState"/></returns>
+        Task<IList<OrchestrationState>> GetStateAsync();


We have to be very careful about making changes to DurableTask.Core. If we need any changes in this area, then we need to core DTFx team to approve them - and that may not be easy since this design might not work well for other implementations, like DurableTask.ServiceBus. Should we try to implement this first only in DurableTask.AzureStorage to avoid any potential conflict?

Thanks. I understand. I remove this commit from my branch. I add it because of OrchestratorClient uses TaskHub. I start with create a testing. :)

cgillum · 2018-05-27T05:38:20Z

src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs

+            var instancestatuses = new List<OrchestrationInstanceStatus>(100);
+            var stopwatch = new Stopwatch();
+            var requestCount = 0;
+            bool finishedEarly = false;


It looks like stopwatch, requestCount, and finishedEarly aren't really used for anything. Can they be removed?

cgillum · 2018-05-27T05:39:33Z

src/DurableTask.AzureStorage/AzureStorageOrchestrationService.cs

+        /// Get states of the all orchestration instances
+        /// </summary>
+        /// <returns>List of <see cref="OrchestrationState"/></returns>
+        public async Task<IList<OrchestrationState>> GetStateAsync()


The other methods similar to this one are named GetOrchestrationStateAsync. I think we should name this one the same.

cgillum · 2018-05-27T05:43:36Z

src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs

+            {
+                foreach(OrchestrationInstanceStatus entity in instancestatuses)
+                {
+                    var state = await ConvertFromAsync(entity, entity.PartitionKey);


Instead of calling ConvertFromAsync inside a second foreach loop, would it be more efficient to do it inside the previous foreach loop (so that we only loop through the list of instances once instead of twice)?

Done!

One concern is, I use Linq for the refactoring. I'm not sure if you prefer the Linq. I do like this. for make it single loop.

var tasks = segment.AsEnumerable<OrchestrationInstanceStatus>().Select(async x => await ConvertFromAsync(x, x.PartitionKey)); OrchestrationState[] result = await Task.WhenAll(tasks); orchestrationStates.AddRange(result);

TsuyoshiUshio · 2018-05-28T04:06:15Z

Hi @cgillum

For enable the testing, I add one method to the TestOrchestrationClient.cs. Since you said that to add Core we should be careful, however, the TestOrchestrationClient is depend on TaskHub and
I needed to add this method. It is not cool way, however, I can't find any other good solution.

internal AzureStorageOrchestrationService GetServiceClient()
{
       return (AzureStorageOrchestrationService)this.client.serviceClient;
}

ba1afae

Any feedback is welcome.

TsuyoshiUshio · 2018-05-28T04:09:47Z

@cgillum Then what is the next step? I have two options.

Implement OrchestratorClient on the DurableTaskFramework extension repo
Implement TaskHub and other Core methods.

Maybe 1. should be fine. Then How I can test it? Create a nuget package of this branch then put it on local, then implement OrchestratorClient and test it using AzureStorageOrchestrationService directly then move on 2?

TsuyoshiUshio · 2018-05-28T06:16:41Z

One of the concern for the next step is, how can we handle the branching. This branch comes from azure-storage branch. You might want to release this one with azure-storage branch. After the branch is merged, we can refer it from Durable Framework extension repo. Do you have a strategy how to manage two repos?

cgillum · 2018-05-28T06:32:10Z

I still need to figure out the exact branching and release strategy, but I have some ideas. My goal is that DurableTask.AzureStorage can be updated and released independently as much as possible.

For now, I think the next step is for you is to build your DurableTask.AzureStorage nuget package locally and reference it locally from Microsoft.Azure.WebJobs.Extensions.DurableTask. This is the normal workflow I follow and this will unblock the next phase of development.

Once we are happy with the PR for Microsoft.Azure.WebJobs.Extensions.DurableTask, then we can merge your changes in DurableTask.AzureStorage into azure-storage. From there, I can publish to the myget feed, and that will unblock the azure-functions-durable-extension CI. Once the CI looks good, we can decide whether to go ahead and merge azure-storage into master and publish the updated DurableTask.AzureStorage to nuget.org.

Will that work for you? Let me know if you need help with instructions for deploying the DurableTask.AzureStorage nuget package locally. Kanio might know how to do this as well.

I think your suggestion to isolate the changes in DurableTask.AzureStorage is good and it's exactly what I would have done. I'll try to review it in more detail tomorrow. For now I think you should be unblocked though.

TsuyoshiUshio · 2018-05-28T09:02:23Z

Thank you, @cgillum .

I'll test and start implement the logic to the Durable Functions repo.
Then, Can I folk a branch from dev branch as usual?

cgillum · 2018-06-03T23:21:48Z

When you are ready, please increment the package version in DurableTask.AzureStorage.csproj. I will then merge your change, create a nuget package, and publish it to the staging location. If the CI in azure-functions-durable-extension passes, then we can publish this package to nuget.org.

TsuyoshiUshio · 2018-06-04T11:23:16Z

Hi @cgillum ,

I tested locally, it works well. I update the version of the AzureStorage.

One thing to share is, I encounter the issue of Azure/azure-webjobs-sdk#1492
I solve this via the workaround. Since this issue is not related this pull request.

 return new HttpResponseMessage(System.Net.HttpStatusCode.OK) { Content = new StringContent(result) };

If you update / publish the nuget package, please let me know. :) I also start investigate the documentation change for this new feature.

TsuyoshiUshio · 2018-06-04T12:20:48Z

Hi @cgillum Something to share.

When we use this feature from Http API, it might be like this. code= is host key of Azure Functions.

"http://localhost:7071/runtime/webhooks/DurableTaskExtension/instances/?taskHub=DurableFunctionsHub&connection=Storage&code=es4Bb3qqtascBKLtKo5z41IP7Ne3QuooyWIoVXSpaqcZ2E47bFlSZg==",

I was wondering how customer get TaskHub/Connection/Code. Simplest solution is use CreateCheckStatusRespons API. However, it requires "instanceId". It might be not a big deal. However, I'd like to share with you in advance.

cgillum · 2018-06-05T00:03:38Z

TaskHub and Connection should be known, since they are configured statically. The code can be fetched from the Azure Functions host admin APIs. It's a good point that this may not be easy to acquire. We will need to think about how difficult this might be for users. Hopefully we can use tools to help make it easier. We could also consider creating a new API for getting your new instance query URL.

TsuyoshiUshio · 2018-06-05T02:56:44Z

Yeah. That it will be the next pull request. :)

* DurableTask.AzureStorage API to enumerate instances (#187) * DurableTask.AzureStorage ETW trace improvements (#192) * Adding 4MB limit check for Azure Storage Table batch insert (#191) * DurableTask.AzureStorage: Alternate fix for the 4 MB max entity size which covers more scenarios. (#194) * Updated Newtonsoft.Json to v11.0.2, WindowsAzure.Storage to v8.6.0. (#193) * Fixed issues with the ETW event source and added test reliability improvements.

Share for start discussion

5da1054

cgillum reviewed May 21, 2018

View reviewed changes

Extract method for DRY

e648bfb

Adding GetStateAsync method for AzureStorageOrchestrationService

6e7a3d7

cgillum requested changes May 27, 2018

View reviewed changes

TsuyoshiUshio added 6 commits May 28, 2018 10:24

Remove TODO. These lines are necessary.

f24445f

Refactor the variable name

cd851be

Remove the double loop from the GetStateAsync method

163009f

Remove stopwatch, requestCount, and finishedEarly.

6997053

Change name to fit the style of the class.

e17a03a

Create a test for the GetStatus

ba1afae

Change interface for cancellationToken implementaion.

c2f365a

TsuyoshiUshio mentioned this pull request May 30, 2018

APIs to enumerate all orchestration instances Azure/azure-functions-durable-extension#323

Merged

cgillum changed the title ~~On the fly Orchestrator status~~ DurableTask.AzureStorage API to enumerate instances Jun 3, 2018

Update the package version

363888e

cgillum approved these changes Jun 5, 2018

View reviewed changes

cgillum merged commit 3f10f0d into Azure:azure-storage Jun 5, 2018

DurableTask.AzureStorage API to enumerate instances #187

DurableTask.AzureStorage API to enumerate instances #187

Conversation

TsuyoshiUshio commented May 21, 2018

cgillum left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gled4er commented May 23, 2018

cgillum commented May 23, 2018 • edited Loading

gled4er commented May 23, 2018

TsuyoshiUshio commented May 24, 2018

TsuyoshiUshio commented May 24, 2018

cgillum commented May 24, 2018

TsuyoshiUshio commented May 24, 2018

TsuyoshiUshio commented May 26, 2018

cgillum commented May 27, 2018

cgillum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TsuyoshiUshio May 28, 2018 • edited Loading

Choose a reason for hiding this comment

TsuyoshiUshio commented May 28, 2018

TsuyoshiUshio commented May 28, 2018

TsuyoshiUshio commented May 28, 2018

cgillum commented May 28, 2018

TsuyoshiUshio commented May 28, 2018

cgillum commented Jun 3, 2018

TsuyoshiUshio commented Jun 4, 2018

TsuyoshiUshio commented Jun 4, 2018 • edited Loading

cgillum commented Jun 5, 2018

TsuyoshiUshio commented Jun 5, 2018

cgillum left a comment •

edited

Loading

cgillum commented May 23, 2018 •

edited

Loading

TsuyoshiUshio May 28, 2018 •

edited

Loading

TsuyoshiUshio commented Jun 4, 2018 •

edited

Loading