-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow K8s cluster creation and deletion #711
Comments
We are trying to migrate from Google GKE to AKS and find AKS provisioning times appalling and unworkable for our use case. We mainly use kubernetes for research, we have automation to create ephemeral clusters for different scenarios which we could do without issue on GCP. Add to the fact that AKS only supports a single node pool, such slow creation/teardown times makes AKS not a viable alternative. (Of course we are being forced to migrate as corporate policy and hence we have to figure out a workaround but this does not foster confidence in Azure as a platform.) |
We don't know why is too slow AKS creation and deletion. GCP is really faster. |
Not sure about deletion, but support for faster deployment times is coming, from conversations I had with some MS folks at Ignite. You can see some of the ongoing work for this in ACS engine: Azure/acs-engine#3721 |
@tomasr That's great! At least, it sounds promising. Thank you very much for the update and link. |
Is enabling RBAC supposed to make spinning up new clusters slower? Like the others here have mentioned, spinning up just a 3 node cluster takes quite a bit of time. It's even worse now that I've enabled RBAC. My clusters take between 30~40min to spin up. |
@btai24 I'm not exactly sure what impact RBAC has on cluster creation speed. Just for the record, all our clusters are RBAC-enabled almost since that feature became GA on AKS. What is the size of your clusters that take 30-40 min. to start? Just curious ... |
@ablekh, 3 node cluster. I haven't tried deploying an RBAC-disabled cluster today but the last two weeks they were taking between 9~15min to spin up. Today I've only attempted to deploy RBAC-enabled clusters and each one has taken over 30 minutes. |
@btai24 I see. As I mentioned above, for me an average time was ~20 min. So, perhaps, the difference is due to deployment region, availability zone and/or, more likely, configuration (size etc.) of underlying VMs. |
With our most recent release, we are seeing average deployment times for the default cluster at about 6 min. Feel free to reopen if you're seeing something different. |
@seanmck Sounds good. Thank you for the update as well as your and your team's efforts. |
I am attempting to implement AKS using rancher. But every-time the time of creation of cluster is almost 50 minutes. For the trial purpose I am using my free account subscription. Is this the case of slowness or Azure itself has the issue of it. Guys can you share ideas if you have implemented azure in Rancher 2.0. Thanks |
+1 to this issue. If we're to treat our AKS clusters as cattle and not pets, we should be able to provision them in reasonable amount of time. ~15 minutes is not a reasonable amount of provisioning time for a 3-Node cluster with default VM size. |
+1 |
+1. Deleting a standard node with a standard VM Size is taking over 30 minutes. |
Recently (for a number of months) I have been working on an Azure AKS-based project. While I really like Azure in general, unfortunately, I can't say the same about AKS at the present time. Over the course of my work on AKS, the platform was far from perfect, to say the least. In this issue I want to mention one of the aspects that certainly significantly decreases the user experience and overall appeal to use AKS platform and, as a consequence, Azure, in general, versus competition [1].
I have been experiencing slow (avg. ~ 20 min.) creation of even very basic (3-node) K8s clusters and even slower (avg. ~ 30 min.) deletion of such clusters, with a most recent single case, when it took a normally operating cluster 50 (fifty!) min. to be deleted. I think that, if Google can make a managed K8s platform fast, then, despite Google engineers' obvious more intimate expertise with Borg/K8s, engineers at Microsoft should be able to achieve similar level of performance as well. I looked at current open AKS issues and don't see many related to the cluster creation/deletion performance, hence this issue. Please share your thoughts and experiences. Also feel free to point to open issues in K8s repositories, blocking relevant AKS fixes.
[1] https://kubedex.com/google-gke-vs-azure-aks-automation-and-reliability
The text was updated successfully, but these errors were encountered: