Skip to content

Why is min_replicas 0 not possible? #1775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dakshvar22 opened this issue Jan 11, 2021 · 13 comments
Closed

Why is min_replicas 0 not possible? #1775

dakshvar22 opened this issue Jan 11, 2021 · 13 comments
Labels
question Further information is requested

Comments

@dakshvar22
Copy link

We are trying to deploy a text generation API on AWS. We do not expect the API to receive a lot of traffic initially and hence we would like to save some costs. My idea was that min_replicas can be set to 0 which would not keep an instance idle in case the traffic on the API is none. As soon as a new request would come in cortex would spawn a new instance and shut it down once the traffic goes back to 0.

However, I noticed that setting min_replicas to 0 is invalid. Isn't the above use case a valid one for this? Also, is this a recent change? I vaguely(very) remember that this was possible to do in version 0.20(Please correct me if I'm wrong) but it seems like it is not in 0.26.

cc @deliahu I opened a new thread here because - 1) It's a different issue than the other thread , 2) Other users might benefit from the conversation here.

@dakshvar22 dakshvar22 added the question Further information is requested label Jan 11, 2021
@deliahu
Copy link
Member

deliahu commented Jan 11, 2021

@dakshvar22 thanks for reaching out regarding this.

Yes, your use case is a valid one; we have #445 to track supporting it. We have not supported this in the past; perhaps you are thinking of instances: we support setting min_instances to 0, which will allow for the instances to be terminated if there are no deployed APIs.

@dakshvar22
Copy link
Author

Ahh I see min_instances in cluster.yaml. What's the difference between replica and instance?

@RobertLucian
Copy link
Member

@dakshvar22 an API can have multiple replicas - in technical terms, pods that run your API as it is specified in your Cortex project. The more there are, the higher the performance of your API. A collection of API replicas is called an API.

A cluster can have multiple instances (of t3.medium, g4dn.xlarge, etc type). These are effectively the cluster's nodes on which the APIs are residing. It's on these that the API replicas reside. As traffic increases, Cortex decides to increase the number of API replicas, which in turn increases the number of cluster nodes (instances). The opposite happens when the traffic gets smaller.

All in all, the replica term is used in the context of APIs, and the instance term is used in the context of a cluster. Does this clarify the situation?

@dakshvar22
Copy link
Author

dakshvar22 commented Jan 11, 2021 via email

@RobertLucian
Copy link
Member

@dakshvar22 almost. Setting min_instances to 0 can reduce the cluster's number of nodes for its APIs down to zero as long as the underlying API(s) are also deleted - and to delete the API, the user's intervention is required. That's because the minimum number of replicas an API can have is 1. Does this make sense to you?

@dakshvar22
Copy link
Author

dakshvar22 commented Jan 11, 2021 via email

@RobertLucian
Copy link
Member

Yes, you are correct - we need to address that ticket. Nonetheless, with a lambda, you could schedule the API to run at specific times. The lambda would deploy/delete the API when programmed to do so.

@dakshvar22
Copy link
Author

dakshvar22 commented Jan 12, 2021 via email

@deliahu
Copy link
Member

deliahu commented Jan 12, 2021

We don't currently have a timeline for this (we generally plan 2-4 weeks ahead). Since we have been focusing our recent efforts on production use cases / features / integrations, scale-to-0 has not bubbled up in priority.

@dakshvar22
Copy link
Author

@RobertLucian Just picking your brain further on this -

with a lambda, you could schedule the API to run at specific times. The lambda would deploy/delete the API when programmed to do so.

I am not sure if that solves our use case because there isn't a specific time when we want the API to be up or down. It should be done according to the traffic it is receiving. So, not sure if we can program to do so. Am I missing something?

@dakshvar22
Copy link
Author

@deliahu Thanks for giving that update. The issue is sort of a blocker for us. I am happy to contribute to the framework to make this possible if I could be pointed at what needs to change. Let me know what you think. :)

@deliahu
Copy link
Member

deliahu commented Jan 13, 2021

@dakshvar22 Yes, we would be open to that, and happy to point you in the right direction!

It would probably be best to start with a quick chat to understand your use case and design the feature (we have a few proposals for it, which would provide different user experiences). Please email me at david@cortex.dev and we can find a time.

@deliahu
Copy link
Member

deliahu commented Jan 20, 2021

I'll go ahead and close this issue, since we have #445 to track supporting scale-to-zero

@deliahu deliahu closed this as completed Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants