Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow K8s cluster creation and deletion #711

Closed
ablekh opened this issue Nov 1, 2018 · 14 comments
Closed

Very slow K8s cluster creation and deletion #711

ablekh opened this issue Nov 1, 2018 · 14 comments

Comments

@ablekh
Copy link

ablekh commented Nov 1, 2018

Recently (for a number of months) I have been working on an Azure AKS-based project. While I really like Azure in general, unfortunately, I can't say the same about AKS at the present time. Over the course of my work on AKS, the platform was far from perfect, to say the least. In this issue I want to mention one of the aspects that certainly significantly decreases the user experience and overall appeal to use AKS platform and, as a consequence, Azure, in general, versus competition [1].

I have been experiencing slow (avg. ~ 20 min.) creation of even very basic (3-node) K8s clusters and even slower (avg. ~ 30 min.) deletion of such clusters, with a most recent single case, when it took a normally operating cluster 50 (fifty!) min. to be deleted. I think that, if Google can make a managed K8s platform fast, then, despite Google engineers' obvious more intimate expertise with Borg/K8s, engineers at Microsoft should be able to achieve similar level of performance as well. I looked at current open AKS issues and don't see many related to the cluster creation/deletion performance, hence this issue. Please share your thoughts and experiences. Also feel free to point to open issues in K8s repositories, blocking relevant AKS fixes.

[1] https://kubedex.com/google-gke-vs-azure-aks-automation-and-reliability

@hejix
Copy link

hejix commented Nov 2, 2018

We are trying to migrate from Google GKE to AKS and find AKS provisioning times appalling and unworkable for our use case. We mainly use kubernetes for research, we have automation to create ephemeral clusters for different scenarios which we could do without issue on GCP. Add to the fact that AKS only supports a single node pool, such slow creation/teardown times makes AKS not a viable alternative. (Of course we are being forced to migrate as corporate policy and hence we have to figure out a workaround but this does not foster confidence in Azure as a platform.)

@alexsandro-xpt
Copy link

We don't know why is too slow AKS creation and deletion. GCP is really faster.

@tomasr
Copy link

tomasr commented Nov 9, 2018

Not sure about deletion, but support for faster deployment times is coming, from conversations I had with some MS folks at Ignite.

You can see some of the ongoing work for this in ACS engine: Azure/acs-engine#3721

@ablekh
Copy link
Author

ablekh commented Nov 9, 2018

@tomasr That's great! At least, it sounds promising. Thank you very much for the update and link.

@btai24
Copy link

btai24 commented Nov 13, 2018

Is enabling RBAC supposed to make spinning up new clusters slower? Like the others here have mentioned, spinning up just a 3 node cluster takes quite a bit of time. It's even worse now that I've enabled RBAC. My clusters take between 30~40min to spin up.

@ablekh
Copy link
Author

ablekh commented Nov 13, 2018

@btai24 I'm not exactly sure what impact RBAC has on cluster creation speed. Just for the record, all our clusters are RBAC-enabled almost since that feature became GA on AKS. What is the size of your clusters that take 30-40 min. to start? Just curious ...

@btai24
Copy link

btai24 commented Nov 13, 2018

@ablekh, 3 node cluster. I haven't tried deploying an RBAC-disabled cluster today but the last two weeks they were taking between 9~15min to spin up. Today I've only attempted to deploy RBAC-enabled clusters and each one has taken over 30 minutes.

@ablekh
Copy link
Author

ablekh commented Nov 13, 2018

@btai24 I see. As I mentioned above, for me an average time was ~20 min. So, perhaps, the difference is due to deployment region, availability zone and/or, more likely, configuration (size etc.) of underlying VMs.

@seanmck
Copy link
Collaborator

seanmck commented Dec 6, 2018

With our most recent release, we are seeing average deployment times for the default cluster at about 6 min. Feel free to reopen if you're seeing something different.

@seanmck seanmck closed this as completed Dec 6, 2018
@ablekh
Copy link
Author

ablekh commented Dec 6, 2018

@seanmck Sounds good. Thank you for the update as well as your and your team's efforts.

@pkeshab
Copy link

pkeshab commented Jan 15, 2019

I am attempting to implement AKS using rancher. But every-time the time of creation of cluster is almost 50 minutes. For the trial purpose I am using my free account subscription. Is this the case of slowness or Azure itself has the issue of it. Guys can you share ideas if you have implemented azure in Rancher 2.0.

Thanks

@syedhassaanahmed
Copy link

+1 to this issue. If we're to treat our AKS clusters as cattle and not pets, we should be able to provision them in reasonable amount of time. ~15 minutes is not a reasonable amount of provisioning time for a 3-Node cluster with default VM size.

@katlimruiz
Copy link

+1

@Joelgullander
Copy link

Joelgullander commented Mar 30, 2020

+1. Deleting a standard node with a standard VM Size is taking over 30 minutes.

@ghost ghost locked as resolved and limited conversation to collaborators Jul 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants