Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Kubernetes: Use AKS curated OS images by default #3721

Merged
merged 6 commits into from
Aug 23, 2018

Conversation

jackfrancis
Copy link
Member

@jackfrancis jackfrancis commented Aug 21, 2018

What this PR does / why we need it: Uses AKS-curated OS images by default.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

If applicable:

  • documentation
  • unit tests
  • tested backward compatibility (ie. deploy with previous version, upgrade with this branch)

Release note:

Kubernetes: Use AKS curated OS images by default

@ghost ghost assigned jackfrancis Aug 21, 2018
@ghost ghost added the in progress label Aug 21, 2018
@acs-bot acs-bot added the size/M label Aug 21, 2018
@codecov
Copy link

codecov bot commented Aug 22, 2018

Codecov Report

Merging #3721 into master will increase coverage by 0.04%.
The diff coverage is 70%.

@@            Coverage Diff             @@
##           master    #3721      +/-   ##
==========================================
+ Coverage   55.49%   55.53%   +0.04%     
==========================================
  Files         108      108              
  Lines       16143    16139       -4     
==========================================
+ Hits         8958     8963       +5     
+ Misses       6420     6407      -13     
- Partials      765      769       +4

@seanknox
Copy link
Contributor

@CecileRobertMichon
Copy link
Contributor

/hold

will update image version to 0.10.0 once it's published

@ghost
Copy link

ghost commented Aug 23, 2018

Is there any documentation around what this new Distribution contains? Is this a fork of Ubuntu or an entirely new Distribution? Will this apply to acs-engine nodes as well, or only AKS?

@CecileRobertMichon
Copy link
Contributor

@neurot1cal not yet, we are still in testing phase. The image is based on Ubuntu 16.04-LTS (same as previous default) and contains pre-installed software needed for deployment that was previously fetched at provisioning time. The objective is to reduce deployment time (significantly) and eliminate transient errors due to general download flakiness. The plan is to enable this for both acs-engine Linux nodes (at first) and AKS nodes (after baking in acs-engine for a bit).

@@ -479,7 +479,7 @@ We consider `kubeletConfig`, `controllerManagerConfig`, `apiServerConfig`, and `
| vnetCidr | no | Specifies the VNET cidr when using a custom VNET ([bring your own VNET examples](../examples/vnet)) |
| imageReference.name | no | The name of the Linux OS image. Needs to be used in conjunction with resourceGroup, below |
| imageReference.resourceGroup | no | Resource group that contains the Linux OS image. Needs to be used in conjunction with name, above |
| distro | no | Select Master(s) Operating System (Linux only). Currently supported values are: `ubuntu` and `coreos` (CoreOS support is currently experimental). Defaults to `ubuntu` if undefined. Currently supported OS and orchestrator configurations -- `ubuntu`: DCOS, Docker Swarm, Kubernetes; `RHEL`: OpenShift; `coreos`: Kubernetes. [Example of CoreOS Master with CoreOS Agents](../examples/coreos/kubernetes-coreos.json) |
| distro | no | Select Master(s) Operating System (Linux only). Currently supported values are: `ubuntu, `aks` and `coreos` (CoreOS support is currently experimental). Defaults to `aks` if undefined. `aks` is a custom image based on `ubuntu` that comes with pre-installed software necessary for Kubernetes deployments. Currently supported OS and orchestrator configurations -- `ubuntu` and `aks`: DCOS, Docker Swarm, Kubernetes; `RHEL`: OpenShift; `coreos`: Kubernetes. [Example of CoreOS Master with CoreOS Agents](../examples/coreos/kubernetes-coreos.json) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A backtick was lost here: should be

...values are: `ubuntu`, `aks`...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thanks will fix :)

@@ -590,7 +590,7 @@ func setAgentNetworkDefaults(a *api.Properties, isUpgrade, isScale bool) {
if !a.OrchestratorProfile.IsOpenShift() {
// Set default Distro to Ubuntu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to change this comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes indeed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean line 486

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed them, I don't think it was a very useful comment

@tariq1890
Copy link
Contributor

/lgtm

@mboersma
Copy link
Member

/lgtm

@acs-bot
Copy link

acs-bot commented Aug 23, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, mboersma, tariq1890

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jackfrancis,mboersma]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@seanknox
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon merged commit 71a920d into Azure:master Aug 23, 2018
@ghost ghost removed the in progress label Aug 23, 2018
@CecileRobertMichon
Copy link
Contributor

@neurot1cal this is in master if you want to give it a try :)

@ablekh
Copy link

ablekh commented Nov 9, 2018

This is very promising. What is an approximate time frame for this fix to appear in AKS? Are there any other issues related to cluster creation/deletion speed? Please address an extremely slow deletion of AKS clusters as well (see my issue 711, referenced above). Thanks!

@jackfrancis jackfrancis deleted the aks-distro branch November 9, 2018 23:35
@CecileRobertMichon
Copy link
Contributor

@ablekh this is already on in AKS worldwide as of mid-October. There are still more improvements coming as this is a continual process to our commitment to deliver fast, reliable deployments. Unfortunately AKS cluster deletion is not affected by this change and not related to the acs-engine project which only does cluster provisioning. AKS handles on the service side (some improvements might be coming there too).

@ablekh
Copy link

ablekh commented Nov 9, 2018

@CecileRobertMichon Thank you so much for your update. Hmm ... then, based on the time frame that you mentioned, unfortunately, it means that this change was definitely not nearly enough to get cluster creation speed down to feasible numbers (please see details in my issue referenced above). Having said that, I definitely appreciate the team's commitment and hard work. I hope that the goal of achieving fast and reliable AKS deployments will be reached sooner, rather than later. :-) And, for that, perhaps, some changes beyond acs-engine are needed. Re: cluster deletion - Understood. I hope that relevant team(s) at MS will address this issue as well.

@CecileRobertMichon
Copy link
Contributor

@ablekh out of curiosity what range of durations are you seeing in AKS deployments lately? Have you noticed any improvements recently? Agreed that more work is needed to get there, this effort was just one part of it. After this change, we've observed average cluster deployments in acs-engine drop to ~4 minutes for 3 nodes and ~7 minutes for 55 nodes. AKS create is built on top off acs-engine but there are more operations that happen in AKS provisioning than in acs-engine so it needs additional time.

@ablekh
Copy link

ablekh commented Nov 10, 2018

@CecileRobertMichon Well, the numbers that I shared in my issue 711 referenced above, unfortunately, apply largely to end of October - beginning of November time frame. Hence, my original comment here. Almost always, I create a 3-node cluster on AKS (and then scale as needed), so my numbers refer to that size. Based on my recollection, an average time for creating a 3-node AKS cluster that I've experienced was ~20 min. IMO, 80% of the total time spent on other (non-acs-engine) operations seems like way too much to me ... Forgot to mention (not sure, if it's important): our deployments are currently covered by MS Azure Education credits, so, maybe the QoS for credits-based AKS is lower ... (I hope not).

@CecileRobertMichon
Copy link
Contributor

@ablekh thanks a lot for the feedback, I'll make sure it gets passed on to the AKS team

@ablekh
Copy link

ablekh commented Nov 10, 2018

@CecileRobertMichon You're very welcome. And thank you very much, too. :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants