Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we set a default CPU/Memory Limit/Request for queue_proxy container #5829

Closed
yuzliu opened this issue Oct 18, 2019 · 20 comments
Closed

Can we set a default CPU/Memory Limit/Request for queue_proxy container #5829

yuzliu opened this issue Oct 18, 2019 · 20 comments
Labels
kind/question Further information is requested
Milestone

Comments

@yuzliu
Copy link

yuzliu commented Oct 18, 2019

In what area(s)?

/area API

Ask your question here:

Hi guys, is there a way for knative serving to set a default value for QueueSideCarResourcePercentageAnnotation:

QueueSideCarResourcePercentageAnnotation = "queue.sidecar." + GroupName + "/resourcePercentage"
?

We can not apply resourcequota to our k8s cluster because the queue_proxy container does not have both CPU/Memory Limit configured. We have multiple components depend on knative serving and I think it makes sense that knative serving set a default value and users can mutate it if we want. Thought?

@yuzliu yuzliu added the kind/question Further information is requested label Oct 18, 2019
@SugandhaAgrawal
Copy link

There exists #4151

@eallred-google eallred-google added this to the Needs Triage milestone Oct 23, 2019
@dgerd
Copy link

dgerd commented Oct 23, 2019

You should just be able to apply the following annotation to the metadata of your Revision Template within your Knative Service.

For example, in your Knative Service Spec

spec:
  template:
    metadata:
      annotations:
        queue.sidecar.serving.knative.dev/resourcePercentage: "10"
    spec:
      containers:
      - image: gcr.io/google-samples/microservices-demo/shippingservice:v0.1.2

@yuzliu
Copy link
Author

yuzliu commented Oct 24, 2019

Thank you guys for the reply! I do know that I can use this annotation but just wondering if knative can set a reasonable default value for users.

@mattmoor
Copy link
Member

What's unreasonable about the current value?

@TheEvilCoder42
Copy link

@mattmoor Let there be 50 idling concurrent Pods running, that's already 25m x 50 = 1250m wasted CPU resource usage.

Sadly annotations like: queue.sidecar.serving.knative.dev/requestCpu: "10m" doesn't seem to work (and as I saw the min boundary is 25m but setting the annotation higher doesn't work either)

@richard2006
Copy link

I agree with @TheEvilCoder42 , can we support set queue requestCpu smaller,or even zero

@vagababov
Copy link
Contributor

vagababov commented Jan 19, 2020

@TheEvilCoder42 50 running idle pods defeats the purpose of knative, i.e. scaling.
Annotation that we have does not specify the absolute value, but rather percentage, so you need 10, rather than 10m [also yeah, you need resourcePercentage].

@knative-housekeeping-robot

Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

@knative-prow-robot knative-prow-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2020
@julz
Copy link
Member

julz commented May 14, 2020

bumping this up: seems like we're relying on users to set QueueSideCarResourcePercentageAnnotation, if not set then we set a default cpu request but no mem request/limit and no cpu limit (https://github.com/knative/serving/blob/master/pkg/reconciler/revision/resources/queue.go#L83).

This means the scheduler won't be accounting for all the queue proxies when scheduling pods, which - unless Im missing something - could lead to OOM errors, and will break on any namespaces with a ResourceQuota set up unless they set a default limit on the namespace (which would work, but might well be much higher than the QP needs) or every user sets this annotation (which means, for a start, none of the samples will work out of the box in such an environment).

Does anyone have an objection to setting a default mem/cpu request/limit in defaults.yaml for this? If not I'll PR one.

@markusthoemmes
Copy link
Contributor

FWIW, we're hitting issues here too when using LimitRanges. If we're adding a setting here, let's please make sure one can completely unset the default too.

In fact: Should the default already be... nothing? Feels weird that we're setting a default request on the queue-proxy but not on the user container 🤔

@julz
Copy link
Member

julz commented May 14, 2020

I think the problem is there's currently no default, but ResourceQuota/LimitRange require the Queue Proxy to have it set. So there's no way for any of the samples to work out of the box on a namespace with ResourceQuota/LimitRange set unless the user adds this annotation.

@markusthoemmes
Copy link
Contributor

Should this be surfaced to the API WG maybe? Work out a small proposal how better integration with LimitRange etc. looks like?

@julz
Copy link
Member

julz commented May 14, 2020

yeah that might make sense, I can experiment a bit with current behaviour then bring up at next api wg meeting

@mattmoor
Copy link
Member

/remove-lifecycle stale

Yeah I don't like what we have now, I think the current value is another thing that dates back to the early VPA work, the annotation (and clamping) is newer, but I think this whole area could use some focused investigation and effort.

I think @vagababov also suspected this as a source of 5xx under higher load (e.g. ~60k QPS)

I'd love to see someone deeply investigate the resource usage of the queue proxy and come up with a proposal for how we manage its resources. Any takers?

cc @tcnghia too

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2020
@julz
Copy link
Member

julz commented May 20, 2020

OK, to get us started on this I spun up a basic proposal for operator-set defaults req/limits and added to API WG meeting agenda.

@yuzisun
Copy link

yuzisun commented May 25, 2020

FYI, we are hitting a critical tail latency issue even for a very simple use case with low qps #8057

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen.Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 23, 2020
@julz
Copy link
Member

julz commented Aug 27, 2020

Just for anyone searching here, operator-settable Queue Proxy requests/limits landed in #8195.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2021
@evankanderson
Copy link
Member

It looks like this is actually done?

/close

@knative-prow-robot
Copy link
Contributor

@evankanderson: Closing this issue.

In response to this:

It looks like this is actually done?

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Further information is requested
Projects
None yet
Development

No branches or pull requests