Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Durable Function storage usage when idle #391

Open
TheFairey opened this issue Jul 16, 2018 · 9 comments
Open

Durable Function storage usage when idle #391

TheFairey opened this issue Jul 16, 2018 · 9 comments
Labels
azure-app-service This is an Azure App Service platform issue Enhancement Feature requests.

Comments

@TheFairey
Copy link

Hi

I have a queue based orchestrator, the queue I'm polling is on another storage account and I have a storage account solely for the function.

Looking at a 24hr period when I submitted nothing to the function it was using approx £0.16 in storage costs.

I turned on the logging and can see that every 10 seconds there are multiple (10) hits on the queues. For each control and workitem Q I see 1 PeekMessage and 1 GetQueueMetadata. I'd have thought the metadata would say there are no messages so the Peek wouldn't be necessary?

Also interesting is that I have 8 partitions configured so there are 8 control Qs, 0-7, but it's only looking at 0-3 every 10s?

The blob storage is getting hit with multiple (3) blob requests for taskhub.json (2 x GetBlobProperties and 1x GetBlob).

I can supply more info if it helps but this means even an idle function will result in approx. £4.87 in storage costs each month (using the cheapest v1 storage).

I understand it needs to poll the Qs for new work but just wanted to check this level of polling and blob reading is to be expected, correct and not excessive.

Cheers

Si

@cgillum
Copy link
Member

cgillum commented Jul 19, 2018

I assume you are running this function app in the Consumption plan?

When using the consumption plan, behind the scenes our scale controller component will poll each queue on a 10 second interval. The poll consists of both a metadata query and a peek, though I would need to double check why it would do a peek if the metadata claims that the length is zero.

If you have 8 partitions but only four queues are being scanned, then it's possible that you initially created the function app with four partitions (the default) and the scale controller has not been notified of your change to 8 partitions. If you click the "refresh" button in the Azure portal, does that cause you to see that all queues are being monitored?

There are some blob operations as well for managing leases. That's likely what you're seeing. I would expect them to be for other blobs and not for taskhub.json, though. I'll need to double-check that as well.

@cgillum cgillum added the Needs: Investigation 🔍 A deeper investigation needs to be done by the project maintainers. label Jul 19, 2018
@TheFairey
Copy link
Author

Thanks for the response and yes, consumption plan.

Ah the scale controller, I wondered what was checking and that makes sense, would be good if you can clarify the metadata + peek element.

I restarted it and I can see it polling all 8 queues now.

More of a feature request but given we will be running a lot of different functions the idle storage costs will soon grow. Could there be a way for say the Scale Controller to stop polling if there's no change for X minutes (maybe a setting in host.json) and then in the Orchestrator have some sort of WakeScaleController
method.

That would save the Scale Controller the time and resources polling something that might spend days idle and also reduce the costs on the storage account?

@cgillum
Copy link
Member

cgillum commented Jul 30, 2018

To clarify further, there are two different sources of polling:

  • The functions host: This uses a backoff polling mechanism with a maximum of 10 seconds. After a period of inactivity, the functions host will get shut down and this polling will stop.
  • The scale controller: This does a constant 10 second poll indefinitely.

I'm working on a change that will increase the maximum wait time from 10 seconds to 30 seconds. That will primarily help people who are using App Service Plans and can be included in the next release, which should hopefully be in a week or two.

Fixing the scale controller logic is a bit more complicated architecturally because it has very little information about what the application is actually doing. We could simply increase the max polling time, but that could cause problems for people who depend on durable timers being triggered accurately. We could look into increasing the max delay time to 30 seconds but I wouldn't feel comfortable with more than 60 seconds.

@gorillapower
Copy link

Hi @cgillum,

We are also experiencing a similar if not the same issue. We have 170 mostly idle consumption plan functions, each with their own hubname thus each with their own queues. As you mention above the scale controller will poll every 10 seconds. If i look at the metrics for the storage account in question, we are getting around 3m PeekMessage and GetQueueMetadata transactions per day. This is resulting in a high cost for us. If i look at the cost breakdown for the storage account, the majority of the cost is coming from "Class 2 Operations - Queues v2" and the metrics below support that.

image

I confirmed that all 170 functions are using the 1.7.0 runtime, which fixes #508.

The polling and costs seems high, but if we take the polling time of every 10 seconds for each queue as you mention, this looks to be in the ballpark. Is there an option to set the polling frequency perhaps? As you mention above, there was a possibility it could be increased to 30 seconds? Perhaps using V1 storage might be an option to further reduce these costs for us.

@cgillum
Copy link
Member

cgillum commented Jan 17, 2019

@gorillapower Indeed, I could see how that might be a problem. V2 storage accounts are nice because of the faster performance, but the costs can be quite high from what I've observed. We don't have an option to set the scale controller polling frequency, but it's something we could look into adding as a way to help reduce storage costs.

@cgillum cgillum added Enhancement Feature requests. azure-app-service This is an Azure App Service platform issue and removed Needs: Investigation 🔍 A deeper investigation needs to be done by the project maintainers. labels Jan 17, 2019
@TheFairey
Copy link
Author

Hi,

Coming back to this after a while and glad that things have improved, I've not been running the function I was running for a while as I had to shelve that piece of work but am coming back to it shortly.

For things we want to scale quickly/responsively 10s is good but appreciate if you're idle 80% of the day then it's not so good.

What would be ideal is if it were something we could set programatically with a reasonable range, say 10s to 60s, so when the orchestration function starts it could change the polling interval to 10s and back when it's done; or something to that effect?

@gorillapower Incidentally when I moved to V1 storage costs dropped significantly and I didn't notice any tangible change in performance but that is somewhat anecdotal!

Si

@cgillum cgillum added this to the Triage milestone Mar 16, 2019
@cgillum
Copy link
Member

cgillum commented Mar 16, 2019

UPDATE: as of the v1.8.0 release, the max polling delay in the runtime is now configurable. Not yet in the scale controller though.

@SergiySeletsky
Copy link

It does not help, I still have BlobLease calls every 10 seconds.

@cgillum
Copy link
Member

cgillum commented Sep 11, 2020

The fix was specific to queue polling. It doesn’t cover blob leases. For that you’ll want to consider deploying to a new app or task hub with a reduced partition count. That will reduce the number of blob leases (and queues), further decreasing background storage transactions.

@ConnorMcMahon ConnorMcMahon removed this from the Triage milestone Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
azure-app-service This is an Azure App Service platform issue Enhancement Feature requests.
Projects
None yet
Development

No branches or pull requests

6 participants