-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
History Retention #17
Comments
@cgillum What's the current status of this? And is there any cleanup logic in place today? |
Right now we're leaning towards not implementing this due to the downsides involved. It should be safe for you to rely on the data always being there. Even if we do eventually implement retention policies, it would most likely be an opt-in feature that doesn't impact existing apps. |
Hi @cgillum I'd like to have some ideas for the four drawbacks.
|
It's an interesting idea, here are some thoughts:
|
@cgillum Then Let's implement (3) with your suggestion. I can't clearly understand how to add the feature for the DurableOrchestrationClient. Could you share that? If we add a small change to the DurableOrchestrationClient, all we need to do is just add Document and some sample for it. :) |
The activity functions could return sensitive data (should I add GDPR ;) ) as intermediate output. This return value is stored in the History table even after the orchestration is completed. |
I need to have this feature in GA, my large messages will blow up my storage account |
Hello @cgillum, I wanted to confirm with you if the current idea for implementing this issue is still to add new API to the Thank you! |
@cgillum How is retention applied to data payload between inter-function communication. My orchestrator fans out to multiple activity functions and aggregates the response which has the potential to grow rapidly. What is the default retention here and what happens if the accumulated data grow significantly in due time? |
@gled4er yes. At a minimum, I think we want the ability to delete all data associated with just a single instance ID. For more convenience, we can also have an API which takes in some filter parameters, like we do for the new instance query API that @TsuyoshiUshio recently added here. @AbhishekTripathi All inter-function communication is associated with a particular orchestration instance ID. If a user deletes that instance, all the durable state related to that instance, including inter-function communication artifacts in storage, will get deleted. So if your orchestrator function fans out to 100K activity functions, and you then use the API to delete that instance, all the parameter data for those 100K activity functions will also be deleted. Does that answer your question? |
Are we talking about an existing API or it is proposed? My question is about current state of things. |
@AbhishekTripathi I'm talking about the proposed API (since that's what this GitHub issue is tracking). We don't have any retention policy today. All data stays in the storage account permanently unless it is manually deleted. So yes, it is possible that this data could grow quite a bit over time depending on the load. |
@cgillum would it be available by v2 GA? |
@AbhishekTripathi Ideally yes. If not, it should come shortly afterwards. |
The function I talked about is built on v1 runtime. It goes without mentioning that proposed enhancement should be compatible with v1 functions too. |
@AbhishekTripathi yes, that's the plan. |
Hello @cgillum, Thank you for the clarification! |
This functionality has been added in the above PRs. I'm happy to say that it will be made available in the next release (v1.7.0)! |
I see the purgeInstanceHistory API available in durable client, does it delete the history and instance from both the azure table storage. Since the name says history its a bit confusing. Can you clarify if its delete from both table |
Update - Verified and it deletes entries from both the tables. Still feel the name is a bit confusing here. |
@cgillum .. I think we still need it to be configurations. I understood that we have only API, but what we want is: not to remember and maintain this cleaning process. |
@MhAllan while I'm inclined to agree, have you considered using a Recurrence-triggered Logic App or Timer-triggered Azure Function to do this for you so you don't have to remember? |
@brandonh-msft , Thank you for your reply. Yes these solutions will work. But unfortunately they are very anti-cloud solution as they feel more like an on premise hack than a cloud design. especially if we think that diagnosing the "small, encapsulated microservice" depends on "another custom tool to make it perform good". Anther negative aspect is that, anyone is new to Azure functions will be wondering "Then when do these tables get truncated"? answer is: they don't, you do it! that is a major turn off in my opinion. |
@brandonh-msft I have tried using a Recurrence-triggered Logic App to clean up both the History and Instances table for a Durable orchestration. However, when attempting to clean the Instances table, that table appears to have an empty RowKey for all records, therefore the "Delete Entity" operation doesn't succeed, claiming a RowKey value is required for the delete operation. Is it expected that RowKey is null for the Instances table? If so, is it possible to use Logic Apps to clean the table? |
I see the History table does have Instances table, however, you are right does not have a value in Update: I'm not able to see a way to make this happen. I'll reach out to the Logic Apps team for some insight here. |
@brandonh-msft Yes, that's consistent with what I see as well - History table does have a RowKey and Logic App cleanup works fine for that table. It's just the Instances table that doesn't have a RowKey, so I can't get Logic Apps to work for cleaning that one. Would be great you could suggest a workaround or if DF could be updated to populate a RowKey to enable Logic Apps to cleanup that table. Thanks! |
Summary
Orchestration history will be deleted some number of days (e.g. 30 days) after the orchestration completes, fails, or terminates. Once this data is deleted, it will no longer be possible to query the status of the purged instances. The number of days will be configurable at the task hub level and the cleanup will be done automatically by the runtime.
Motivation
Orchestrations save their execution history in table storage. This history will grow indefinitely over time and increasing storage costs will be incurred on the Azure subscription.
Design Notes
When an orchestrator function completes, fails, or is terminated, a control message will be queued with an invisibility timeout of the retention period. Once the cleanup message is received, the host will take care of deleting the table storage records for that orchestration instance.
Drawbacks
The text was updated successfully, but these errors were encountered: