-
Notifications
You must be signed in to change notification settings - Fork 607
Why is min_replicas 0 not possible? #1775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@dakshvar22 thanks for reaching out regarding this. Yes, your use case is a valid one; we have #445 to track supporting it. We have not supported this in the past; perhaps you are thinking of instances: we support setting |
Ahh I see |
@dakshvar22 an API can have multiple replicas - in technical terms, pods that run your API as it is specified in your Cortex project. The more there are, the higher the performance of your API. A collection of API replicas is called an API. A cluster can have multiple instances (of All in all, the replica term is used in the context of APIs, and the instance term is used in the context of a cluster. Does this clarify the situation? |
So, if I understand correctly I should be able to achieve my use case by
setting `min_instances` to 0 and `min_replicas` to 1?
…On Mon, Jan 11, 2021, 19:55 Robert Lucian Chiriac ***@***.***> wrote:
@dakshvar22 <https://github.com/dakshvar22> an API can have multiple
replicas - in technical terms, pods that run your API as it is specified in
your Cortex project. The more there are, the higher the performance of your
API. A collection of API replicas is called an API.
A cluster can have multiple instances (of t3.medium, g4dn.xlarge, etc
type). These are effectively the cluster's nodes on which the APIs are
residing. It's on these that the API replicas reside. As traffic increases,
Cortex decides to increase the number of API replicas, which in turn
increases the number of cluster nodes (instances). The opposite happens
when the traffic gets smaller.
All in all, the replica term is used in the context of APIs, and the
instance term is used in the context of a cluster. Does this clarify the
situation?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1775 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCOBGOA4TNFJQUVWUCPYH3SZNCRLANCNFSM4V5MQVYA>
.
|
@dakshvar22 almost. Setting |
Ohh so it wouldn't be possible in an automated way. Then we'll have to wait
for the above linked PR to be implemented.
…On Mon, Jan 11, 2021, 20:21 Robert Lucian Chiriac ***@***.***> wrote:
@dakshvar22 <https://github.com/dakshvar22> almost. Setting min_instances
to 0 can reduce the cluster's number of nodes for its APIs down to zero as
long as the underlying API(s) are also *deleted* - and to delete the API,
the user's intervention is required. That's because the minimum number of
replicas an API can have is 1. Does this make sense to you?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1775 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCOBGJIESR2G6TERFK7T5LSZNFSRANCNFSM4V5MQVYA>
.
|
Yes, you are correct - we need to address that ticket. Nonetheless, with a lambda, you could schedule the API to run at specific times. The lambda would deploy/delete the API when programmed to do so. |
Thanks! Any timeline on when can we expect the above ticket to be tackled?
…On Mon, Jan 11, 2021, 21:06 Robert Lucian Chiriac ***@***.***> wrote:
Yes, you are correct - we need to address that ticket. Nonetheless, with a
lambda, you could schedule the API to run at specific times. The lambda
would deploy/delete the API when programmed to do so.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1775 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCOBGLHR4ZSXKLREAAIOD3SZNK4DANCNFSM4V5MQVYA>
.
|
We don't currently have a timeline for this (we generally plan 2-4 weeks ahead). Since we have been focusing our recent efforts on production use cases / features / integrations, scale-to-0 has not bubbled up in priority. |
@RobertLucian Just picking your brain further on this -
I am not sure if that solves our use case because there isn't a specific time when we want the API to be up or down. It should be done according to the traffic it is receiving. So, not sure if we can program to do so. Am I missing something? |
@deliahu Thanks for giving that update. The issue is sort of a blocker for us. I am happy to contribute to the framework to make this possible if I could be pointed at what needs to change. Let me know what you think. :) |
@dakshvar22 Yes, we would be open to that, and happy to point you in the right direction! It would probably be best to start with a quick chat to understand your use case and design the feature (we have a few proposals for it, which would provide different user experiences). Please email me at david@cortex.dev and we can find a time. |
I'll go ahead and close this issue, since we have #445 to track supporting scale-to-zero |
We are trying to deploy a text generation API on AWS. We do not expect the API to receive a lot of traffic initially and hence we would like to save some costs. My idea was that
min_replicas
can be set to 0 which would not keep an instance idle in case the traffic on the API is none. As soon as a new request would come in cortex would spawn a new instance and shut it down once the traffic goes back to 0.However, I noticed that setting
min_replicas
to 0 is invalid. Isn't the above use case a valid one for this? Also, is this a recent change? I vaguely(very) remember that this was possible to do in version0.20
(Please correct me if I'm wrong) but it seems like it is not in0.26
.cc @deliahu I opened a new thread here because - 1) It's a different issue than the other thread , 2) Other users might benefit from the conversation here.
The text was updated successfully, but these errors were encountered: