-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DurableTask.AzureStorage API to enumerate instances #187
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR! I have some initial feedback.
To answer your second question, I think a full scan is okay. Even if we try to filter to just active instances, the table storage performance will be the same since Azure Tables do not support secondary indexes. If you want to do filtering to save CPU on the customer's worker VM, I suggest adding another parameter which allows for flexible filtering in the table storage query.
foreach(DynamicTableEntity entity in instanceTableEntities) | ||
{ | ||
var instance = new OrchestrationInstance(); | ||
instance.InstanceId = entity.Properties["PartitionKey"].ToString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the previous GetStateAsync
method above. Since it already has logic to convert table storage rows into OrchestrationState
objects, I think it would be best to reuse that code as much as possible. That way if we need to change the logic for reading from table storage, we can do it in one place instead of two places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I’ll fix that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @cgillum ,
Will it make sense to add additional partition in the Instance table with partition key "OnTheFly" (and row key the instance id) that will contain all the instances currently on the fly. Each of the instance will have also dedicated row with its instance id as partition key as well. We can have a setting that will enable the duplicate update of the instance in the "OnTheFly" partition and its own partition. When the instance completes we will delete the row in the "OnTheFly" partition. This will be one more call but we will know both the partition and the row key.
Then the query for retrieving the on the fly instances will be straightforward. I am worried that the Instnaces table will grow over time and a query that will make table scan each time will have bad performance impact.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! lets discuss that. I also worry about the performance since storage table only have a key for partition key and the instance table might be growing. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gled4er , One question. OrchestrationInstanceStatus might be a domain class of instance table right? Then which attribute is the instanceId? I guess PartitionKey. Is it right?
Hi, let's discuss today in person. |
I'm a little concerned about creating a new partition for "on the fly" for several reasons. I'm not sure if I know how to communicate it clearly. The high-level issue is that it pollutes the data in the instances table. It also creates complexity in the data model (and I prefer to avoid complexity whenever possible). In general, data in the partition column should be as balanced as possible, and "on-the-fly" partitions will create a lot of imbalance and will eventually result in different kinds of performance issues. The design is also specific to a particular use case and may not add value to other use cases. My thought is that we should keep this feature simple and not change the structure of the data. Yes there will be performance issues with large numbers of rows - however, i think this is unavoidable when using Azure Table storage, so it's better if we just document it. For many users, the simple implementation will still be very valuable. If there is a need for huge numbers of instances, then we should recommend the Event Grid approach to put the data into a different, optimized data store based on the customer's specific needs (maybe they will use Cosmos DB, or something else). |
Hello @cgillum, Thank you very much for the explanation. I agree. Thank you! |
@cgillum , @gled4er I have one suggestion. How about adding some configuration to remove the old instances. e.g. 1day or 7 days or something on the |
Hi @cgillum One more thing. Could you enable CI for this pull request? I refactor the code to keep DRY. I just did extract method. however, it fails the test. When I see the test code, it seems related timeout. Which seems flakiness. I'd like to see what happens. |
This repo unfortunately does not have a CI enabled, so we have to rely on local testing. :( Hopefully this can be fixed in the near future - but I'll need to talk to the repo owners about it. Regarding the suggestion for removing old instances, let's consider it as a separate feature. We have a GitHub issue tracking something like this: Azure/azure-functions-durable-extension#17 |
@cgillum Make sense! Sure. |
I have a question for testing. I add the query method for AzureTableTrackingStore.cs and AzureStorageOrchestrationService, and TaskHubClient and IOrchestrationServiceClient. Except for the first one, all of others are just delegation. I might need to write test on the AzureStorageScenarioTests.cs with TestOrchestraionClient.
However the tests are executed parallely, If I clear the instance table, it might cause a problem. In this case how do you test the method? Do you have similar example on your test? |
I don't think the tests are executed in parallel because all of them clear the task hub in the very beginning. Is that not what you're observing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. I like how it is simple and easy to understand. One concern I have is about changing DurableTask.Core. I recommend we try not to change it yet and try to test the client-side by calling AzureStorageOrchestrationService.GetStateAsync
directly. That way we don't need to convince the core DTFx team about adding new public APIs that need to be implemented by all providers.
|
||
// TODO do we need these? | ||
this.stats.StorageRequests.Increment(); | ||
this.stats.TableEntitiesRead.Increment(instancestatuses.Count - previousCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the TODO
: Yes, we need these. We track these statistics to know how much load we're putting on the Azure Storage account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
var query = new TableQuery<OrchestrationInstanceStatus>(); | ||
TableContinuationToken token = null; | ||
|
||
var instancestatuses = new List<OrchestrationInstanceStatus>(100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: capitalize the "S" - instanceStatuses
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. However, it eventually removed. lol
/// Get states of the all orchestration instances | ||
/// </summary> | ||
/// <returns>List of <see cref="OrchestrationState"/></returns> | ||
Task<IList<OrchestrationState>> GetStateAsync(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to be very careful about making changes to DurableTask.Core. If we need any changes in this area, then we need to core DTFx team to approve them - and that may not be easy since this design might not work well for other implementations, like DurableTask.ServiceBus. Should we try to implement this first only in DurableTask.AzureStorage to avoid any potential conflict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I understand. I remove this commit from my branch. I add it because of OrchestratorClient uses TaskHub. I start with create a testing. :)
var instancestatuses = new List<OrchestrationInstanceStatus>(100); | ||
var stopwatch = new Stopwatch(); | ||
var requestCount = 0; | ||
bool finishedEarly = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like stopwatch
, requestCount
, and finishedEarly
aren't really used for anything. Can they be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
/// Get states of the all orchestration instances | ||
/// </summary> | ||
/// <returns>List of <see cref="OrchestrationState"/></returns> | ||
public async Task<IList<OrchestrationState>> GetStateAsync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other methods similar to this one are named GetOrchestrationStateAsync
. I think we should name this one the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
{ | ||
foreach(OrchestrationInstanceStatus entity in instancestatuses) | ||
{ | ||
var state = await ConvertFromAsync(entity, entity.PartitionKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of calling ConvertFromAsync
inside a second foreach
loop, would it be more efficient to do it inside the previous foreach
loop (so that we only loop through the list of instances once instead of twice)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
One concern is, I use Linq for the refactoring. I'm not sure if you prefer the Linq. I do like this. for make it single loop.
var tasks = segment.AsEnumerable<OrchestrationInstanceStatus>().Select(async x => await ConvertFromAsync(x, x.PartitionKey));
OrchestrationState[] result = await Task.WhenAll(tasks);
orchestrationStates.AddRange(result);
Hi @cgillum For enable the testing, I add one method to the TestOrchestrationClient.cs. Since you said that to add Core we should be careful, however, the TestOrchestrationClient is depend on TaskHub and
Any feedback is welcome. |
@cgillum Then what is the next step? I have two options.
Maybe 1. should be fine. Then How I can test it? Create a nuget package of this branch then put it on local, then implement OrchestratorClient and test it using AzureStorageOrchestrationService directly then move on 2? |
One of the concern for the next step is, how can we handle the branching. This branch comes from azure-storage branch. You might want to release this one with azure-storage branch. After the branch is merged, we can refer it from Durable Framework extension repo. Do you have a strategy how to manage two repos? |
I still need to figure out the exact branching and release strategy, but I have some ideas. My goal is that DurableTask.AzureStorage can be updated and released independently as much as possible. For now, I think the next step is for you is to build your DurableTask.AzureStorage nuget package locally and reference it locally from Microsoft.Azure.WebJobs.Extensions.DurableTask. This is the normal workflow I follow and this will unblock the next phase of development. Once we are happy with the PR for Microsoft.Azure.WebJobs.Extensions.DurableTask, then we can merge your changes in DurableTask.AzureStorage into azure-storage. From there, I can publish to the myget feed, and that will unblock the azure-functions-durable-extension CI. Once the CI looks good, we can decide whether to go ahead and merge azure-storage into master and publish the updated DurableTask.AzureStorage to nuget.org. Will that work for you? Let me know if you need help with instructions for deploying the DurableTask.AzureStorage nuget package locally. Kanio might know how to do this as well. I think your suggestion to isolate the changes in DurableTask.AzureStorage is good and it's exactly what I would have done. I'll try to review it in more detail tomorrow. For now I think you should be unblocked though. |
Thank you, @cgillum . I'll test and start implement the logic to the Durable Functions repo. |
When you are ready, please increment the package version in |
Hi @cgillum , I tested locally, it works well. I update the version of the AzureStorage. One thing to share is, I encounter the issue of Azure/azure-webjobs-sdk#1492
If you update / publish the nuget package, please let me know. :) I also start investigate the documentation change for this new feature. |
Hi @cgillum Something to share. When we use this feature from Http API, it might be like this.
I was wondering how customer get TaskHub/Connection/Code. Simplest solution is use CreateCheckStatusRespons API. However, it requires "instanceId". It might be not a big deal. However, I'd like to share with you in advance. |
TaskHub and Connection should be known, since they are configured statically. The code can be fetched from the Azure Functions host admin APIs. It's a good point that this may not be easy to acquire. We will need to think about how difficult this might be for users. Hopefully we can use tools to help make it easier. We could also consider creating a new API for getting your new instance query URL. |
Yeah. That it will be the next pull request. :) |
* DurableTask.AzureStorage API to enumerate instances (#187) * DurableTask.AzureStorage ETW trace improvements (#192) * Adding 4MB limit check for Azure Storage Table batch insert (#191) * DurableTask.AzureStorage: Alternate fix for the 4 MB max entity size which covers more scenarios. (#194) * Updated Newtonsoft.Json to v11.0.2, WindowsAzure.Storage to v8.6.0. (#193) * Fixed issues with the ETW event source and added test reliability improvements.
Hi @cgillum and @gled4er,
I open this pull request to start discussion the details.
I have two topic to discuss.
In the OrchestrationState has a lot of members. Which attribute do we need for this purpose?
We'll full scan search for the instance table. Is it OK? or do we have any other clever way to fetch the on the fly instances? Currenty, I just return all record of the isntance table.
This pull request is correspond to this issue.
Azure/azure-functions-durable-extension#316