Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AKS service principal creation not handling async #103

Closed
jjcollinge opened this issue Sep 13, 2018 · 8 comments
Closed

AKS service principal creation not handling async #103

jjcollinge opened this issue Sep 13, 2018 · 8 comments
Assignees
Labels
area/providers kind/bug Some behavior is incorrect or out of spec
Milestone

Comments

@jjcollinge
Copy link

When running the AKS example for the first time I got the following error:

error: Plan apply failed: containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="ServicePrincipalNotFound" Message="Service principal clientID: ****-****-****-**** not found in Active Directory tenant ****-****-****-****, Please see https://aka.ms/acs-sp-help for more details."

When I re-ran pulumi up it worked. I believe this is due to the asynchronous nature of AAD service principal creation and propagation rather than a bug in the dependency graph. AAD is a particularly problematic resource and I believe has an SLA of ~2 mins for entity availability. It'd be worth having some validate and retry logic such as that in the Azure CLI (Azure/azure-cli#1499) before returning from the create method.

@lukehoban lukehoban self-assigned this Sep 13, 2018
@lukehoban
Copy link
Contributor

Looking into this. It sounds like it may be an issue in the underlying terraform-provider-azurerm with some wait logic missing from the resource creation path.

@lukehoban lukehoban added this to the 0.18 milestone Sep 13, 2018
@lukehoban
Copy link
Contributor

@jjcollinge
Copy link
Author

@lukehoban thanks, looks like you're right. I see the latest Pulumi azure resources are based on TF 1.13 so they should include the fix TF released in 1.12. Do you think their fix didn't actually resolve the underlying issue or am I missing something?

@lukehoban
Copy link
Contributor

Do you think their fix didn't actually resolve the underlying issue or am I missing something?

I don't believe there is a fix for terraform-providers/terraform-provider-azurerm#1635 yet. I believe the fix for hashicorp/terraform-provider-azurerm#1647 addressed a similar case but for a different API (role assignment, not AKS).

@lukehoban lukehoban modified the milestones: 0.18, 0.19 Sep 24, 2018
@joeduffy joeduffy added kind/bug Some behavior is incorrect or out of spec area/providers labels Sep 30, 2018
@lukehoban
Copy link
Contributor

I'm going to close this out - since it is really tracking terraform-providers/terraform-provider-azurerm#1635 in upstream getting fixed/merged and then adopting that into Pulumi. We expect to be adopting latest providers into Pulumi on a continuous basis in the near term, so will just want to push on the upstream issue to get fixed.

@jjcollinge Feel free to reopen if you'd like to have an issue to track on the Pulumi side - else tracking this in upstream is probably sufficient.

@lukehoban
Copy link
Contributor

Actually - appears this was just fixed a couple days ago - and will be in the next release: hashicorp/terraform-provider-azurerm#2204

@d-nishi d-nishi reopened this May 8, 2019
@lukehoban
Copy link
Contributor

lukehoban commented May 8, 2019

My comment above about this being fixed is incorrect - https://github.com/terraform-providers/terraform-provider-azurerm/issues/1635 is still open. That said, I believe there is nothing Pulumi-specific we can do here.

The "best current workaround" described there is to sleep for 30 seconds. That can be done in Pulumi with:

let adApp = new azuread.Application("aks");
let adSp = new azuread.ServicePrincipal("aksSp", { applicationId: adApp.applicationId });
let adSpPassword = new azuread.ServicePrincipalPassword("aksSpPassword", {
    servicePrincipalId: adSp.id,
    // ...
});
let delayedAdSpPasswordValue = adSpPassword.value.apply(async (val) => {
    // Wait for 30s
    console.log("Waiting for 30s for AD Service Principal eventual consistency...");
    await new Promise(resolve => setTimeout(resolve, 30000));
    return val;
});
let aks = new azure.containerservice.KubernetesCluster("aksCluster", {
    //...
    servicePrincipal: {
        // ...
        clientSecret: delayedAdSpPasswordValue,
    },
});

@mastoj
Copy link

mastoj commented Jul 11, 2022

@lukehoban, I guess applies to ManagedCluster in azure-native as well? Or is it a fix for it now that I have missed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/providers kind/bug Some behavior is incorrect or out of spec
Projects
None yet
Development

No branches or pull requests

5 participants