-
Notifications
You must be signed in to change notification settings - Fork 522
feat: enable system-assigned identity by default #3856
feat: enable system-assigned identity by default #3856
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3856 +/- ##
=========================================
Coverage ? 72.99%
=========================================
Files ? 149
Lines ? 23201
Branches ? 0
=========================================
Hits ? 16935
Misses ? 5156
Partials ? 1110
Continue to review full report at Codecov.
|
!cs.Properties.IsCustomCloudProfile() && | ||
!cs.Properties.MasterProfile.IsVirtualMachineScaleSets() && | ||
o.KubernetesConfig.UseManagedIdentity == nil { | ||
o.KubernetesConfig.UseManagedIdentity = to.BoolPtr(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also enable it during upgrade? since we weren't setting a default before, most of my generated apimodels have no set value for useManagedIdentity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, addressed!
@@ -177,6 +177,13 @@ func (cs *ContainerService) setOrchestratorDefaults(isUpgrade, isScale bool) { | |||
} | |||
} | |||
|
|||
if !cs.Properties.IsHostedMasterProfile() && | |||
!cs.Properties.IsCustomCloudProfile() && | |||
!cs.Properties.MasterProfile.IsVirtualMachineScaleSets() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why disable for VMSS masters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to vlabs/validate.go VMSS master VMs only support user-assigned identity (not system-assigned)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually ran some real world cluster buildouts to see what happens w/ VMSS control plane + system-assigned identity, and sure enough on the first test I am unable to create a Load Balancer:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 2m25s (x8 over 12m) service-controller Ensuring load balancer
Warning ListLoadBalancers 2m25s (x8 over 12m) azure-cloud-provider Retriable: false, RetryAfter: 0s, HTTPStatusCode: 400, RawError: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 400, RawError: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/aa3d3369-e814-4495-899d-d31e8d7d09ce/resourceGroups/kubernetes-westus2-95074/providers/Microsoft.Network/loadBalancers?api-version=2019-06-01: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"}
Warning SyncLoadBalancerFailed 2m25s (x8 over 12m) service-controller Error syncing load balancer: failed to ensure load balancer: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 400, RawError: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 400, RawError: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/aa3d3369-e814-4495-899d-d31e8d7d09ce/resourceGroups/kubernetes-westus2-95074/providers/Microsoft.Network/loadBalancers?api-version=2019-06-01: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"}
@ritazh @sozercan were you involved in implementing VMSS masters? Do you recall if this is one of the expected symptoms that dictates that system-assigned identity is not a working solution?
Hold while we address race conditions in the template spec
|
@@ -177,6 +177,14 @@ func (cs *ContainerService) setOrchestratorDefaults(isUpgrade, isScale bool) { | |||
} | |||
} | |||
|
|||
if !isUpgrade && !isScale && | |||
!cs.Properties.IsHostedMasterProfile() && | |||
!cs.Properties.IsCustomCloudProfile() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for taking custom clouds into account.
c0b127a
to
f9f0ce6
Compare
@@ -146,6 +146,11 @@ func (authArgs *authArgs) validateAuthArgs() error { | |||
return errors.New("--auth-method is a required parameter") | |||
} | |||
|
|||
// Back-compat to accommodate existing client usage patterns that assume that "client-secret" is the default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jadarsie Do you have any concerns here for Azure Stack? Does the az
CLI auth model work for Azure Stack? If not, are you comfortable with this back-compat solution that "re-sets" the default auth method to client_secret
if the service principal ID and password are included in the command line arguments?
The goal for us is to default to CLI as the auth model because for most users it is easier (don't have to generate/maintain service principals, easier command statements). If there's a reason that a local az
context isn't possible, then this back-compat solution should work for pre-existing scripts that pass in the id and pass but haven't ever bothered to set the --auth-method
to client_secret
explicitly, because it's always been the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @jadarsie @haofan-ms
@@ -270,13 +270,13 @@ func autofillApimodel(dc *deployCmd) error { | |||
if dc.dnsPrefix == "" { | |||
return errors.New("apimodel: missing masterProfile.dnsPrefix and --dns-prefix was not specified") | |||
} | |||
log.Warnf("apimodel: missing masterProfile.dnsPrefix will use %q", dc.dnsPrefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be a warning, it's literally a CLI feature
@@ -74,108 +74,48 @@ Note, we have launched a browser for you to login. For old experience with devic | |||
You have logged in. Now let us find all the subscriptions to which you have access... | |||
``` | |||
|
|||
Next, we'll create a resource group. A resource group is a container that holds related resources for an Azure solution. In Azure, you logically group related resources such as storage accounts, virtual networks, and virtual machines (VMs) to deploy, manage, and maintain them as a single entity. In this case, we want to deploy, manage and maintain the whole Kubernetes cluster as a single entity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These steps are actually unnecessary, so are not in the spirit of "quick start" for first time users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I have wondered before why we weren't letting aks-engine deploy
create the RG for us...
@@ -41,10 +41,6 @@ This is the AAD Pod Identity add-on. Add this add-on to your json file as shown | |||
} | |||
] | |||
} | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaning up these placeholder objects as we're no longer emphasizing the service principal solution
@@ -998,44 +994,6 @@ func (a *Properties) validateServicePrincipalProfile() error { | |||
return nil | |||
} | |||
|
|||
func (a *Properties) validateManagedIdentity() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire validation logic is to ensure you're using k8s >= 1.12.0, which is not necessary.
a930693
to
8170a5f
Compare
/azp run pr-e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
08c37d6
to
a92b91d
Compare
@@ -74,108 +74,48 @@ Note, we have launched a browser for you to login. For old experience with devic | |||
You have logged in. Now let us find all the subscriptions to which you have access... | |||
``` | |||
|
|||
Next, we'll create a resource group. A resource group is a container that holds related resources for an Azure solution. In Azure, you logically group related resources such as storage accounts, virtual networks, and virtual machines (VMs) to deploy, manage, and maintain them as a single entity. In this case, we want to deploy, manage and maintain the whole Kubernetes cluster as a single entity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I have wondered before why we weren't letting aks-engine deploy
create the RG for us...
spelling correction Co-authored-by: Matt Boersma <Matt.Boersma@microsoft.com>
…/jackfrancis/aks-engine into system-assigned-identity-default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis, mboersma The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Reason for Change:
This PR changes the defaults configuration assignment implementation to set
useManagedIdentity
to true by default, effectively enabling system-assigned identity for all node VMs.Official Azure docs here:
https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview#managed-identity-types
In order to do so reliably, the API version for system-assigned identity has been updated to
2018-09-01-preview
so that theroleAssignment
resource doesn't reference a VM-derived service principal before it has replicated (that API version provides a solution for that race condition).Note: this default change was not applied for Azure Stack, which doesn't support managed identity at this time.
Also modified the default
--auth-method
for all CLI gestures tocli
, fromclient_secret
, to further deprioritize the suggestion to rely upon manually curated service principals as part of the AKS Engine solution set. Using that auth method by default w/ system-assigned identity by default yields a much simpler working, default CLI expression:$ aks-engine deploy --resource-group $RG --location $LOCATION --api-model $API_MODEL_JSON_PATH
Also updated doc examples to reflect this simpler default flow.
Issue Fixed:
Fixes #3885
Credit Where Due:
Does this change contain code from or inspired by another project?
If "Yes," did you notify that project's maintainers and provide attribution?
Requirements:
Notes: