Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale to zero #9

Open
simonw opened this issue May 19, 2021 · 63 comments
Open

Scale to zero #9

simonw opened this issue May 19, 2021 · 63 comments

Comments

@simonw
Copy link

simonw commented May 19, 2021

I have a serious side-project habit - I often have dozens of side projects on the go at once.

As such, I really appreciate scale-to-zero services like Google Cloud Run and Vercel, where if my project isn't getting any traffic at all it costs me nothing (or just a few cents a month in storage costs) - then it spins up a server when a request comes in, with a cold-start delay of a few seconds before it starts serving traffic.

I would love it if App Runner could do this! It looks like at the moment you have to pay for a minimum of one running instance.

@danthegoodman1
Copy link

Scale to zero is really important for small projects that don’t need 24/7 compute running, and especially contractor work. Besides that microservices that don’t need to be running all the time, and side projects where someone wants a full container and is willing to deal with cold starts (like lambda but not being constrained to API gateway or lambda utilities in the container).

scale to 0 is the only thing that prevents me from using it.

@danthegoodman1
Copy link

This could be used very well for light batch jobs as well if it could scale to 0

@timanderson
Copy link

It scales to just $0.007 per GB/hour if the application is idle as far as I can tell, no vCPU cost. Or there is a PauseService API call to eliminate that cost too. If you had a batch job, you could call ResumeService at the start and PauseService at the end?

@danthegoodman1
Copy link

Yeah but that’s still $5/month more than cloud run, and if I’m using such a heavily managed service I wouldn’t want to automate pausing and resuming myself (with batch it would be fine but for an api it would be very difficult)

@Munawwar
Copy link

What should happen when a request is made when instances are zero? Currently (if service is paused) the root url gives a http status code 404. Do you want the end-point to wait a bit from responding for some time to give a chance to spawn an instance and respond?

@danthegoodman1
Copy link

What should happen when a request is made when instances are zero? Currently (if service is paused) the root url gives a http status code 404. Do you want the end-point to wait a bit from responding for some time to give a chance to spawn an instance and respond?

Yeah, basically a cold start similar to lambda. That’s how cloud run does it.

@danthegoodman1
Copy link

I’ll also add it would be really good for many users to not be forced to scale to zero like cloud run does. Like if we had some field to set “minimum containers” or something similar to lambda provisioned concurrency, because there are some use cases where you never want a cold start.

@mwarkentin
Copy link

@danthegoodman1 you can configure the minimum "provisioned" containers which stay active (paying for mem only, not CPU) - except you can only set that to >= 1.

It would be nice if you could leave that at 1 if you wanted to remove cold start, or 0 if you wanted to optimize for costs and were ok with some latency when the first request comes in to the system after its scaled down.

@nelsonjchen
Copy link

@danthegoodman1 You aren't forced to scale to zero in Cloud Run and can set minimum instances that even charge/run at the "idle rate":

https://cloud.google.com/run/docs/configuring/min-instances

@danthegoodman1
Copy link

@danthegoodman1 You aren't forced to scale to zero in Cloud Run and can set minimum instances that even charge/run at the "idle rate":

https://cloud.google.com/run/docs/configuring/min-instances

This only used to be available if you were using cloud run for anthos, didn’t realize they updated it, thanks.

@danthegoodman1
Copy link

@danthegoodman1 you can configure the minimum "provisioned" containers which stay active (paying for mem only, not CPU) - except you can only set that to >= 1.

It would be nice if you could leave that at 1 if you wanted to remove cold start, or 0 if you wanted to optimize for costs and were ok with some latency when the first request comes in to the system after its scaled down.

Yep just wanted to make sure we still kept that feature just in case!

@flibustenet
Copy link

I would use scaling to zero for dev stack. Also if i want to show the new version of my app to my customer, he can then look at the app just when he want.

@486
Copy link

486 commented May 19, 2021

Apart from smaller projects, scale to zero would be super useful for development workflows. Imagine many developers deploying code branches for testing. Right now, they would have to consciously deprovision the service when they are not working on it.

With scale to zero, there would be no costs when they aren’t working (= not sending requests to their personal deployment). And cold start latency isn’t relevant in this scenario.

@nelsonjchen
Copy link

nelsonjchen commented May 20, 2021

I'm surprised this hasn't been mentioned before but a turnoff of GCP is that they don't have an "Amazon Aurora Serverless" equivalent to go along with Cloud Run.

Scale to Zero App Runner + Amazon Aurora Serverless would be a dream.

@danthegoodman1
Copy link

I'm surprised this hasn't been mentioned before but a turnoff of GCP is that they don't have an "Amazon Aurora Serverless" equivalent to go along with Cloud Run.

Scale to Zero AppRunner + Amazon Aurora Serverless would be a dream.

300 IQ right there, that’s an awesome idea. Also dynamodb would work too but aurora serverless (v2 Postgres plz) would be a real separator

@tomaszdudek7
Copy link

Seconding that 300 IQ statement. We need that! GCP Cloud Run is ahead. Don't make us use their product.

@danthegoodman1
Copy link

This is also amazing for Slack bots btw. I'd move our team Slack bot here in an instant if we could scale to 0.

@stephanoparaskeva
Copy link

@nelsonjchen @danthegoodman1 Why is it not yet possible to use App Runner + Aurora PostgreSQL Serverless?

I am trying to deploy an API on app on App Runner, and the API is supposed to connect to the Aurora PostgreSQL Serverless endpoint, but I cant get it to connect (locally or on App Runner). Does this mean it doesnt work?

@danthegoodman1
Copy link

@stephanoparaskeva make sure you’re in the same vpc, have proper security groups and routing tables, or in your case it sounds like enabled public access if you’re trying to test locally (although please don’t do in production)

@pavelsource
Copy link

pavelsource commented Jun 6, 2021

@stephanoparaskeva Aurora Serverless can only be accessed from within VPC via private IP. AppRunner does not support VPC integration yet, however it is on the roadmap: #1

@nelsonjchen
Copy link

@nelsonjchen @danthegoodman1 Why is it not yet possible to use App Runner + Aurora PostgreSQL Serverless?

I am trying to deploy an API on app on App Runner, and the API is supposed to connect to the Aurora PostgreSQL Serverless endpoint, but I cant get it to connect (locally or on App Runner). Does this mean it doesnt work?

I don't think I said it wasn't possible. You likely have connectivity issues not related to this lack of scale to zero issue like @danthegoodman1 said.

@stephanoparaskeva
Copy link

@stephanoparaskeva Aurora Serverless can only work within VPC with a private IP. AppRunner does not support VPC integration yet, however it is on the roadmap: #1

Ah ok so I should use a public DB for the time being.

  • Once VPC is released -- if both App Runner and Aurora are in the same VPC, should it just connect via endpoint + user + password?

  • Also how does one connect to Aurora using a locally running version of their API (is this possible)?

Thanks for the swift response!

@nelsonjchen
Copy link

@stephanoparaskeva Aurora Serverless can only work within VPC with a private IP. AppRunner does not support VPC integration yet, however it is on the roadmap: #1

Ah ok so I should use a public DB for the time being.

  • Once VPC is released -- if both App Runner and Aurora are in the same VPC, should it just connect via endpoint + user + password?

  • Also how does one connect to Aurora using a locally running version of their API (is this possible)?

Thanks for the swift response!

Could you take this conversation elsewhere? This issue is about scale to zero.

@toricls
Copy link

toricls commented Jun 6, 2021

Just for clarification, but you can use Aurora Serverless v1 without VPC by using its Data API :)

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html

@JonMarbach
Copy link

I think there's some relevant discussion in aws/containers-roadmap#1017 on scale-to-zero for Fargate which is probably applicable here too. I made a more detailed post RE Fargate, but summing up quickly: a few seconds (or more maybe) of cold-start latency would be OK for me, and I can send a ping to the service to mitigate that prior to hitting it full-force.

@CarlosDomingues
Copy link

CarlosDomingues commented Jan 14, 2022

Just adding my 2 cents. Lots of backoffice-like and data applications could not care less about latency. Actually being able to scale to zero would be amazing for many use cases. This could be a non default configurable feature and the UI could say explicitly how scaling memory to zero affects latency.

@iBobik
Copy link

iBobik commented Jan 31, 2022

I use currently Heroku for this use-case - it scales to zero on free accounts.

I have not tried it yet, but because Heroku runs on AWS it could have low latency to Aurora Serverless databases.

@siviae
Copy link

siviae commented Feb 8, 2022

I think that "what about the cold start" argument is unrelated to the topic, because you'll expect cold starts in any service providing autoscaling, that's just the way programs work (especially ones running in a virtual machine).

Our company usecase: we want our applications to be serverless, but we don't really want to change our development model to lambda one. We want to build traditional microservices, which could handle more than one request at a time, but we also want our dev and staging environment cost less when it is not used.

@phishy
Copy link

phishy commented Mar 22, 2022

If AppRunner doesn't scale to 0, why does mine show 0 instances when there is no traffic? Also my minimum is set to 1.

@nelsonjchen
Copy link

I'm unsubscribing from this issue since I don't really run anything on AWS at the moment professionally anymore. That said, I do want to leave a funny anecdote about one competition.

fly.io doesn't have scale to zero containers. They still have the same problem as apprunner. They do have scale to zero ec2-like-machines though! You can have a scale to zero Minecraft server. Via AWS's own firecracker too! Opposite world! How bizarre!

They ain't no big cloud, but it's just funny to me. Hopefully AWS might see to it they can catch up to organizations using their own products to make scale to zero products.

@algoflows
Copy link

Surprised to see no progress and no response. I've had better responses from core GCP Cloud team members on youtube threads.

@iomarcovalente
Copy link

Subscribing, I also would be very interested in this to happen

@jdrphillips
Copy link

jdrphillips commented Feb 8, 2023

I really would like any scale to zero container service, that isn't lambda. I am willing to pay for from-cold delays but not repeated lambda delays.

@suzukieng
Copy link

Would really like scale to zero. I have AppRunner instances for multiple environment (dev/staging/prod). Dev and Staging are usually only needed when I'm actively developing or testing the next release. It would be nice not to be charged for these "idle" instances.

@ebg1223
Copy link

ebg1223 commented Jun 20, 2023

This would be a big deal coming from GCP Cloud Run!

@algoflows
Copy link

And another year goes by and not a single reply from the AWS team... insane.

@atali
Copy link

atali commented Jun 20, 2023

Hey AWS team, do you have any ETA ? it seems that feature is needed by everyone.

@cade-coreschedule
Copy link

2 years, 300+ reacts, and no response from AWS? Shocking. We may be forced to Google Cloud Run as it clearly has more investment in it.

@tornikeo
Copy link

Ah $H!%. It's never a good sign to find to an open issue with 300+ upvotes. Guess I gotta switch my approach 😆

@alexanderwink
Copy link

alexanderwink commented Aug 17, 2023

Just to let everyone know. When the app doesn't receive any new requests for a while the CPU will throttle down to close to zero. At that time you will only pay for the provisioned memory and not for CPU. This is equivalent with scaling down to zero with a warm standby. As soon as new request comes in it will throttle back up and you will start paying for CPU used as well as memory.

image

This is described in the documentation:

When your application is deployed, you pay for the memory provisioned in each container instance. Keeping your container instance's memory provisioned when your application is idle ensures it can deliver consistently low millisecond latency

When your application is processing requests, you switch from provisioned container instances to active container instances that consume both memory and compute resources. You pay for the compute and any additional memory counsumed in excess of the memory allocated by your provisioned container instances.

@benkehoe
Copy link

@alexanderwink can you provide a link to that page in the docs?

@alexanderwink
Copy link

Says so right on the pricing page https://aws.amazon.com/apprunner/pricing/

@gabrielboucher
Copy link

Just to let everyone know. When the app doesn't receive any new requests for a while the CPU will throttle down to close to zero. At that time you will only pay for the provisioned memory and not for CPU.

This is still ~5$/month per instance with 1GB of memory. Not that it's much but it adds up quickly, especially for apps that are used once a month or less.

My real need here is a cold standby mode.

@matteocontrini
Copy link

This is still ~5$/month per instance with 1GB of memory.

Plus, in eu-central-1 the price is higher and becomes ~$6,4/GB. If you want 1 vCPU you're forced to reserve 2 GB RAM which means $13/month just for the idle app.

@iBobik
Copy link

iBobik commented Aug 17, 2023

Would you use service what can start your server like this? Or it could not show a button, but start on every request. Then it could stop server after X minutes of inactivity. Could start/stop instance, app runner, container, ...

start.page.moqup.mov

If yes, write me an e-mail about your use-case. I consider creating it, but I does not want to work on it only for me. :-)

@CarlosDomingues
Copy link

CarlosDomingues commented Aug 21, 2023

Adding to what @gabrielboucher said:

Summing prod and staging environments, my company runs 250+ Streamlit dashboards, APIs and data pipelines that are infrequently acessed (anything from a few times a day to one time monthly).

Assuming our average container has 2vCPUs + 2GB of memory (and my math is correct), we are talking about ~ 2500 USD monthly just for provisioned workloads. Actual bill would be much higher as each running container would cost ~ 0.14 per hour.

Nowadays we use Kubernetes + Knative + Karpenter with EC2 Spot Instances and the equivalent part of our AWS bill is significantly cheaper.

@mreferre
Copy link

This isn't meant to be the solution for what's being discussed in the last 10-ish updates in this thread but if your application usage is so sparse (e.g. one hit a month) have you considered using Lambda with the Web adapter? This blog post walks you through the why and the how.

The TL/DR is that you would just need to add this entry to your Dockerfile to run your Python based application unmodified in Lambda:

COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.7.0 /lambda-adapter /opt/extensions/lambda-adapter

I have used it here to demonstrate how to run the same unmodified container image in Fargate and Lambda.

If this works for you basically have a scale to zero service that, at that level of usage, is going to be free given the generous Lambda free-tier (assuming you are not using Lambda already and you are ok with experiencing a cold start).

Again I am not suggesting this is the solution for the requirement you are raising, but perhaps some of your use cases may be served using this approach.

@mreferre
Copy link

mreferre commented Aug 21, 2023

Adding to what @gabrielboucher said:

Summing prod and staging environments, my company runs 250+ Streamlit dashboards, APIs and data pipelines that are infrequently acessed (anything from a few times a day to one time monthly).

Assuming our average container has 2vCPUs + 2GB of memory (and my math is correct), we are talking about ~ 2500 USD monthly just for provisioned workloads. Actual bill would be much higher as each running container would cost ~ 0.14 per hour.

For the records, my understanding is that Streamlit leverages websockets connections and App Runner does not support it (yet).

See here and here.

@CarlosDomingues
Copy link

@mreferre Thanks for the links, that's some really cool stuff!

We've tried Lambdas in the past but the timeout limits + developer experience were show stoppers for us. Might reconsider for the subset of our APIs that don't require long running jobs.

@anandhu-renie
Copy link

I'm kinda confused with the pricing of App Runner. Idle instances are when I'm not getting any traffic. But when does it become idle?

@alexanderwink
Copy link

I'm kinda confused with the pricing of App Runner. Idle instances are when I'm not getting any traffic. But when does it become idle?

When there is no incoming traffic to the application there is ramp-down windows for aboutn 60 seconds. After that you will see that the active instances metric goes down to 0. At this point you only pay for provisioned memory.

@anandhu-renie
Copy link

Got it thanks!

@TreyWW
Copy link

TreyWW commented Mar 20, 2024

400+ votes and this still isn't even being thought about.. Wow. I'd love to hear more from AWS on whether this will soon come. I'm guessing it wont because it won't directly help AWS really, they wont really gain anything from adding this

@wesmontgomery
Copy link

Can we get an update on this please? Hoping this will be moved to "Researching" soon...

@backnol-aws @snnles @jsheld @lazarben @scuw19 @amitgupta85 @akshayram-wolverine

@larryjkl
Copy link

I don't know anything, but just thinking about it and reading the comments, and watching the re:Invent videos on how they built and designed this it might be that scaling to zero just isn't feasible in App Runner? They did go to the trouble to scale down to zero CPU ( but keep the memory ). The functionality they do provide that is vaguely similar ( pause / unpause ) takes about 40s for it to become active after it was paused for my very simple app which seems way too long for a cold start. It's frustrating when they don't give any feedback on things, but I'll be surprised if they are able to pivot to provide it scale to true zero ( although it would be great ).

@masterbater
Copy link

I like this people @efekarakus @iamhopaul123 working on aws copilot, I hope this brilliant people could work on aws app runner roadmap like hipaa and scale to zero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests