-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many services in a single namespace leads to assorted problems. #8498
Comments
Given the sheer volume of services we create, I have the sneaking suspicion that this is related to the service link environment variables injected by k8s. @dprotaso has a change to allow us to use I am going to try pulling that in and see if it alleviates these problems. If it does, then I think we have our compelling reason to make this default configurable. |
Yeah Dave's change should help with this |
|
@tshak Ack, when this was raised previously we didn't really have tangible evidence that this could cause correctness or performance problems at scale. I am running a test now that should confirm/deny that impact, and then we can revisit in the WG call. However, I think the sentiment from the previous WG call stands: we need to handle this like a change to "stable" behavior. Assuming we have a smoking gun... I'd suggest:
I don't see a super compelling reason to remove the configurable default (since it enables "K8s compat mode"), but I think these issues would be enough to sway my view on the default behavior. I'll report back in a few hundred more services 😉 |
Alright, I have confirmed the last piece of this. Above 1200 services, we fail to cold start with service links, but without we still see pretty exceptional cold starts...
|
Overnight I did a fresh run creating 1500 ksvc (with kn as above, but with service links disabled by default thanks to @vagababov), and the Revision creation latency is completely flat: ... bear in mind that the original graph was only 1000 ksvc. So I think we should start a discussion around how we want to change this default behavior. Thoughts? |
Closing in favor of #8563 |
See knative#8498 for why are we doing this. Fixes knative#8563
@mattmoor do you happen to remember if this issue was strictly due to K8s Services env vars being injected, or whether it would have happened even if the user defined the same number of env vars manually? |
@duglin I'd guess it'd happen with either |
ok thanks |
/area API
/area autoscale
What version of Knative?
HEAD
Description of the problem.
I wanted to push on our limits a bit, and so I wrote the very innovative (patent pending 🤣 ) script below. I plotted the latency between
creationTimestamp
andstatus.conditions[Ready].lastTransitionTime
here.A few observations in this context:
Here's where it gets interesting... On a whim, I tried picking back up in a second namespace, and things work! Not only do they work, but cold start latency for the new services is back down!
Steps to Reproduce the Problem
I needed a GKE cluster with at least 10 nodes (post-master resize) to tolerate the number of services this creates. I was playing with this in the context of
mink
, but there's no reason that would affect what I'm seeing.I gathered the latencies as CSV with:
The text was updated successfully, but these errors were encountered: