Durable Function storage usage when idle #391

TheFairey · 2018-07-16T08:58:39Z

Hi

I have a queue based orchestrator, the queue I'm polling is on another storage account and I have a storage account solely for the function.

Looking at a 24hr period when I submitted nothing to the function it was using approx £0.16 in storage costs.

I turned on the logging and can see that every 10 seconds there are multiple (10) hits on the queues. For each control and workitem Q I see 1 PeekMessage and 1 GetQueueMetadata. I'd have thought the metadata would say there are no messages so the Peek wouldn't be necessary?

Also interesting is that I have 8 partitions configured so there are 8 control Qs, 0-7, but it's only looking at 0-3 every 10s?

The blob storage is getting hit with multiple (3) blob requests for taskhub.json (2 x GetBlobProperties and 1x GetBlob).

I can supply more info if it helps but this means even an idle function will result in approx. £4.87 in storage costs each month (using the cheapest v1 storage).

I understand it needs to poll the Qs for new work but just wanted to check this level of polling and blob reading is to be expected, correct and not excessive.

Cheers

Si

cgillum · 2018-07-19T17:26:25Z

I assume you are running this function app in the Consumption plan?

When using the consumption plan, behind the scenes our scale controller component will poll each queue on a 10 second interval. The poll consists of both a metadata query and a peek, though I would need to double check why it would do a peek if the metadata claims that the length is zero.

If you have 8 partitions but only four queues are being scanned, then it's possible that you initially created the function app with four partitions (the default) and the scale controller has not been notified of your change to 8 partitions. If you click the "refresh" button in the Azure portal, does that cause you to see that all queues are being monitored?

There are some blob operations as well for managing leases. That's likely what you're seeing. I would expect them to be for other blobs and not for taskhub.json, though. I'll need to double-check that as well.

TheFairey · 2018-07-20T16:40:33Z

Thanks for the response and yes, consumption plan.

Ah the scale controller, I wondered what was checking and that makes sense, would be good if you can clarify the metadata + peek element.

I restarted it and I can see it polling all 8 queues now.

More of a feature request but given we will be running a lot of different functions the idle storage costs will soon grow. Could there be a way for say the Scale Controller to stop polling if there's no change for X minutes (maybe a setting in host.json) and then in the Orchestrator have some sort of WakeScaleController
method.

That would save the Scale Controller the time and resources polling something that might spend days idle and also reduce the costs on the storage account?

cgillum · 2018-07-30T15:59:12Z

To clarify further, there are two different sources of polling:

The functions host: This uses a backoff polling mechanism with a maximum of 10 seconds. After a period of inactivity, the functions host will get shut down and this polling will stop.
The scale controller: This does a constant 10 second poll indefinitely.

I'm working on a change that will increase the maximum wait time from 10 seconds to 30 seconds. That will primarily help people who are using App Service Plans and can be included in the next release, which should hopefully be in a week or two.

Fixing the scale controller logic is a bit more complicated architecturally because it has very little information about what the application is actually doing. We could simply increase the max polling time, but that could cause problems for people who depend on durable timers being triggered accurately. We could look into increasing the max delay time to 30 seconds but I wouldn't feel comfortable with more than 60 seconds.

gorillapower · 2019-01-17T12:03:51Z

Hi @cgillum,

We are also experiencing a similar if not the same issue. We have 170 mostly idle consumption plan functions, each with their own hubname thus each with their own queues. As you mention above the scale controller will poll every 10 seconds. If i look at the metrics for the storage account in question, we are getting around 3m PeekMessage and GetQueueMetadata transactions per day. This is resulting in a high cost for us. If i look at the cost breakdown for the storage account, the majority of the cost is coming from "Class 2 Operations - Queues v2" and the metrics below support that.

I confirmed that all 170 functions are using the 1.7.0 runtime, which fixes #508.

The polling and costs seems high, but if we take the polling time of every 10 seconds for each queue as you mention, this looks to be in the ballpark. Is there an option to set the polling frequency perhaps? As you mention above, there was a possibility it could be increased to 30 seconds? Perhaps using V1 storage might be an option to further reduce these costs for us.

cgillum · 2019-01-17T14:28:29Z

@gorillapower Indeed, I could see how that might be a problem. V2 storage accounts are nice because of the faster performance, but the costs can be quite high from what I've observed. We don't have an option to set the scale controller polling frequency, but it's something we could look into adding as a way to help reduce storage costs.

TheFairey · 2019-01-18T09:24:10Z

Hi,

Coming back to this after a while and glad that things have improved, I've not been running the function I was running for a while as I had to shelve that piece of work but am coming back to it shortly.

For things we want to scale quickly/responsively 10s is good but appreciate if you're idle 80% of the day then it's not so good.

What would be ideal is if it were something we could set programatically with a reasonable range, say 10s to 60s, so when the orchestration function starts it could change the polling interval to 10s and back when it's done; or something to that effect?

@gorillapower Incidentally when I moved to V1 storage costs dropped significantly and I didn't notice any tangible change in performance but that is somewhat anecdotal!

Si

cgillum · 2019-03-16T16:58:02Z

UPDATE: as of the v1.8.0 release, the max polling delay in the runtime is now configurable. Not yet in the scale controller though.

SergiySeletsky · 2020-09-11T14:29:23Z

It does not help, I still have BlobLease calls every 10 seconds.

cgillum · 2020-09-11T15:05:15Z

The fix was specific to queue polling. It doesn’t cover blob leases. For that you’ll want to consider deploying to a new app or task hub with a reduced partition count. That will reduce the number of blob leases (and queues), further decreasing background storage transactions.

cgillum added the Needs: Investigation 🔍 A deeper investigation needs to be done by the project maintainers. label Jul 19, 2018

cgillum mentioned this issue Jul 23, 2018

Durable Functions (Fan Out) + Azure Storage = Host Threshold Exceeded (Connections) #389

Closed

cgillum added Enhancement Feature requests. azure-app-service This is an Azure App Service platform issue and removed Needs: Investigation 🔍 A deeper investigation needs to be done by the project maintainers. labels Jan 17, 2019

cgillum assigned glennamanns Jan 17, 2019

cgillum mentioned this issue Feb 22, 2019

Configurable queue polling delays #618

Closed

cgillum unassigned glennamanns Mar 16, 2019

cgillum added this to the Triage milestone Mar 16, 2019

ConnorMcMahon removed this from the Triage milestone Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Durable Function storage usage when idle #391

Durable Function storage usage when idle #391

TheFairey commented Jul 16, 2018

cgillum commented Jul 19, 2018

TheFairey commented Jul 20, 2018

cgillum commented Jul 30, 2018

gorillapower commented Jan 17, 2019

cgillum commented Jan 17, 2019

TheFairey commented Jan 18, 2019

cgillum commented Mar 16, 2019

SergiySeletsky commented Sep 11, 2020

cgillum commented Sep 11, 2020

Durable Function storage usage when idle #391

Durable Function storage usage when idle #391

Comments

TheFairey commented Jul 16, 2018

cgillum commented Jul 19, 2018

TheFairey commented Jul 20, 2018

cgillum commented Jul 30, 2018

gorillapower commented Jan 17, 2019

cgillum commented Jan 17, 2019

TheFairey commented Jan 18, 2019

cgillum commented Mar 16, 2019

SergiySeletsky commented Sep 11, 2020

cgillum commented Sep 11, 2020