Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History Retention #17

Closed
cgillum opened this issue Jul 16, 2017 · 26 comments
Closed

History Retention #17

cgillum opened this issue Jul 16, 2017 · 26 comments
Labels
dtfx Enhancement Feature requests.

Comments

@cgillum
Copy link
Member

cgillum commented Jul 16, 2017

Summary

Orchestration history will be deleted some number of days (e.g. 30 days) after the orchestration completes, fails, or terminates. Once this data is deleted, it will no longer be possible to query the status of the purged instances. The number of days will be configurable at the task hub level and the cleanup will be done automatically by the runtime.

Motivation

Orchestrations save their execution history in table storage. This history will grow indefinitely over time and increasing storage costs will be incurred on the Azure subscription.

Design Notes

When an orchestrator function completes, fails, or is terminated, a control message will be queued with an invisibility timeout of the retention period. Once the cleanup message is received, the host will take care of deleting the table storage records for that orchestration instance.

Drawbacks

  • There is no way to change the retention period of an orchestration instance after the cleanup has been scheduled.
  • There is currently no way to archive the execution history of orchestrations. If this is desired, a separate work item should track this.
  • Cleanup will incur I/O on the storage account. This may impact the performance of currently executing orchestrations.
@cgillum cgillum added the Enhancement Feature requests. label Jul 16, 2017
@cgillum cgillum added this to the Beta milestone Jul 16, 2017
@cgillum cgillum modified the milestone: Beta Sep 7, 2017
@cgillum cgillum added the dtfx label Nov 11, 2017
@ethanroday
Copy link

@cgillum What's the current status of this? And is there any cleanup logic in place today?
Real question: if I want a particular instance with a specified id to only run once and, once completed, never be able to run again, is it okay to rely on GetStatusAsync indefinitely to check for its existence? Or should I expect that the record of its previous run will disappear eventually?

@cgillum
Copy link
Member Author

cgillum commented Mar 22, 2018

Right now we're leaning towards not implementing this due to the downsides involved. It should be safe for you to rely on the data always being there. Even if we do eventually implement retention policies, it would most likely be an opt-in feature that doesn't impact existing apps.

@TsuyoshiUshio
Copy link
Collaborator

Hi @cgillum
According to the drawbacks, it is for the history table. However, we can do it for instance table to remove already finished instances for certain of the time period. e.g. we can set it on the host.json like 1 day or 7 days or something, then Durable asynchronously remove the older instance records. Or should I discuss it on a new issue?

I'd like to have some ideas for the four drawbacks.

  1. Impact for existing apps.
    -> For the instance table, it doesn't occur.
  2. There is no way to change the retention policy until you deploy new one.
    -> I don't have any idea. Accept it. For the instance table, it seems not big deal.
  3. We might need separate work item to clean up
    -> How about Durable Functions create a simple time trigger functions if we set config to the host.json? Then it is almost the same that customer create a custom Functions. Then it can achieve 4. as well and we don't need to have a separate work item. (just an idea)
  4. Clean up incur the I/O of storage account
    -> How about set the start time to clean up?

@cgillum
Copy link
Member Author

cgillum commented May 24, 2018

It's an interesting idea, here are some thoughts:

  • If we remove data only from the Instances table, then we have an inconsistency between the two tables. That might make it harder to implement a full data retention feature later which cleans up both History and Instances tables if the data is inconsistent.
  • I think your comment about (3) is very interesting. Maybe we can create an API on DurableOrchestrationClient which allows cleaning up instance data. Then developers can use a timer trigger like you suggest to implement any cleanup policy they want (and we don't have to implement/design one for them). I like this idea because then the developer is in control and can make the best decision for their scenario.
  • Doing clean-up at start time might be tricky, especially since there could be many VMs running at the same time and maybe some VMs start up later than others. I also worry that this could negatively impact cold-start time, but maybe that's not a big issue since it only blocks background processing and doesn't have to necessarily impact HTTP triggers or webhooks.

@TsuyoshiUshio
Copy link
Collaborator

TsuyoshiUshio commented May 24, 2018

@cgillum Then Let's implement (3) with your suggestion. I can't clearly understand how to add the feature for the DurableOrchestrationClient. Could you share that? If we add a small change to the DurableOrchestrationClient, all we need to do is just add Document and some sample for it. :)

@audipen
Copy link

audipen commented Jul 10, 2018

The activity functions could return sensitive data (should I add GDPR ;) ) as intermediate output. This return value is stored in the History table even after the orchestration is completed.
Is there anyway to clear this result?
A property on the 'DurableOrchestrationClientBase' to cleanup would be cool so that it can be set and the cleanup is done once the orchestration completes.

@cgillum cgillum added this to the Functions V2 GA milestone Aug 30, 2018
@andrew-vdb
Copy link

I need to have this feature in GA, my large messages will blow up my storage account

@gled4er
Copy link
Collaborator

gled4er commented Sep 6, 2018

Hello @cgillum,

I wanted to confirm with you if the current idea for implementing this issue is still to add new API to the DurableOrchestrationClient responsible for cleaning up the instance data. And then developers can use a timer trigger to implement any cleanup policy they want.

Thank you!

@AbhishekTripathi
Copy link

AbhishekTripathi commented Sep 6, 2018

@cgillum How is retention applied to data payload between inter-function communication. My orchestrator fans out to multiple activity functions and aggregates the response which has the potential to grow rapidly. What is the default retention here and what happens if the accumulated data grow significantly in due time?

@cgillum
Copy link
Member Author

cgillum commented Sep 6, 2018

@gled4er yes. At a minimum, I think we want the ability to delete all data associated with just a single instance ID. For more convenience, we can also have an API which takes in some filter parameters, like we do for the new instance query API that @TsuyoshiUshio recently added here.

@AbhishekTripathi All inter-function communication is associated with a particular orchestration instance ID. If a user deletes that instance, all the durable state related to that instance, including inter-function communication artifacts in storage, will get deleted. So if your orchestrator function fans out to 100K activity functions, and you then use the API to delete that instance, all the parameter data for those 100K activity functions will also be deleted. Does that answer your question?

@AbhishekTripathi
Copy link

Are we talking about an existing API or it is proposed? My question is about current state of things.

@cgillum
Copy link
Member Author

cgillum commented Sep 6, 2018

@AbhishekTripathi I'm talking about the proposed API (since that's what this GitHub issue is tracking). We don't have any retention policy today. All data stays in the storage account permanently unless it is manually deleted. So yes, it is possible that this data could grow quite a bit over time depending on the load.

@AbhishekTripathi
Copy link

@cgillum would it be available by v2 GA?

@cgillum
Copy link
Member Author

cgillum commented Sep 6, 2018

@AbhishekTripathi Ideally yes. If not, it should come shortly afterwards.

@AbhishekTripathi
Copy link

The function I talked about is built on v1 runtime. It goes without mentioning that proposed enhancement should be compatible with v1 functions too.

@cgillum
Copy link
Member Author

cgillum commented Sep 6, 2018

@AbhishekTripathi yes, that's the plan.

@gled4er
Copy link
Collaborator

gled4er commented Sep 7, 2018

Hello @cgillum,

Thank you for the clarification!

@cgillum
Copy link
Member Author

cgillum commented Nov 7, 2018

This functionality has been added in the above PRs. I'm happy to say that it will be made available in the next release (v1.7.0)!

@cgillum cgillum closed this as completed Nov 7, 2018
@cgillum cgillum removed this from the Durable Functions v2.0 release milestone Nov 7, 2018
@huzefaqubbawala
Copy link

I see the purgeInstanceHistory API available in durable client, does it delete the history and instance from both the azure table storage.

Since the name says history its a bit confusing. Can you clarify if its delete from both table

@huzefaqubbawala
Copy link

I see the purgeInstanceHistory API available in durable client, does it delete the history and instance from both the azure table storage.

Since the name says history its a bit confusing. Can you clarify if its delete from both table

Update - Verified and it deletes entries from both the tables. Still feel the name is a bit confusing here.

@MhAllan
Copy link

MhAllan commented Nov 12, 2020

@cgillum .. I think we still need it to be configurations. I understood that we have only API, but what we want is: not to remember and maintain this cleaning process.

@brandonh-msft
Copy link
Member

@cgillum .. I think we still need it to be configurations. I understood that we have only API, but what we want is: not to remember and maintain this cleaning process.

@MhAllan while I'm inclined to agree, have you considered using a Recurrence-triggered Logic App or Timer-triggered Azure Function to do this for you so you don't have to remember?

@MhAllan
Copy link

MhAllan commented Nov 13, 2020

@brandonh-msft , Thank you for your reply.

Yes these solutions will work. But unfortunately they are very anti-cloud solution as they feel more like an on premise hack than a cloud design. especially if we think that diagnosing the "small, encapsulated microservice" depends on "another custom tool to make it perform good". Anther negative aspect is that, anyone is new to Azure functions will be wondering "Then when do these tables get truncated"? answer is: they don't, you do it! that is a major turn off in my opinion.

@carfarmer
Copy link

@brandonh-msft I have tried using a Recurrence-triggered Logic App to clean up both the History and Instances table for a Durable orchestration. However, when attempting to clean the Instances table, that table appears to have an empty RowKey for all records, therefore the "Delete Entity" operation doesn't succeed, claiming a RowKey value is required for the delete operation. Is it expected that RowKey is null for the Instances table? If so, is it possible to use Logic Apps to clean the table?

@brandonh-msft
Copy link
Member

brandonh-msft commented Nov 30, 2020

@brandonh-msft I have tried using a Recurrence-triggered Logic App to clean up both the History and Instances table for a Durable orchestration. However, when attempting to clean the Instances table, that table appears to have an empty RowKey for all records, therefore the "Delete Entity" operation doesn't succeed, claiming a RowKey value is required for the delete operation. Is it expected that RowKey is null for the Instances table? If so, is it possible to use Logic Apps to clean the table?

I see the History table does have RowKey set... are you seeing something different?
image

Instances table, however, you are right does not have a value in RowKey - I will see if I can create a workaround for this in lieu of DF being updated to set the rowkey to an actual value (cc @cgillum )

Update: I'm not able to see a way to make this happen. I'll reach out to the Logic Apps team for some insight here.

@carfarmer
Copy link

@brandonh-msft Yes, that's consistent with what I see as well - History table does have a RowKey and Logic App cleanup works fine for that table. It's just the Instances table that doesn't have a RowKey, so I can't get Logic Apps to work for cleaning that one. Would be great you could suggest a workaround or if DF could be updated to populate a RowKey to enable Logic Apps to cleanup that table. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dtfx Enhancement Feature requests.
Projects
None yet
Development

No branches or pull requests