-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Pipeline issues - AKS cluster (and other Azure resources) not created reliably #809
Comments
I ran another test against |
Hi @drew0ps, thank you for raising the issue. I think this issue is not only for pipeline/uptest. I just tried in a local kind cluster and the resource cannot be created for over 20 minutes without any errors.
|
Hi @turkenf, thank you for the response and checking. I suspect there is something with this specific CR since I also checked locally with the monolith (make run) and there I succeeded with a different CR (my k8s extensions story), although I run rancher desktop:
The differences are: In this run the AKS cluster creation succeeded: https://github.com/crossplane-contrib/provider-upjet-azure/actions/runs/10699008353/job/29659653711 |
I am experiencing the same issue in the examples you gave, and this may be caused by our subscriptions 🤔 |
Hm that could be, but that does not explain why the AKS cluster succeeds with the creation on this run on subscription "2895a7df-ae9f-41b8-9e78-3ce4926df838" :
This succeeds after like 5 minutes. The extension install still fails though, which eventually blocks my PR. All of this works on my own Azure env though. |
Quick update here - the AKS cluster does not install because there is no free tier AKS anymore in West Europe. In North Europe it is still failing but with a different reason most probably. I could reproduce it with make e2e though and there I get the following event in the Azure Activity Log as I mentioned in my PR:
The same message comes up all the time so retry doesn't help. |
For some reason the AKS cluster creation enters an update loop trying to remove a field called maxSurge. However, I found that if the field is added to the manifests this doesn't happen. It still seems to be a bug, although there is a workaround:
|
Is there an existing issue for this?
Affected Resource(s)
containerservice.azure.upbound.io/v1beta1 - KubernetesCluster
Resource MRs required to reproduce the bug
No response
Steps to Reproduce
What happened?
This run is using the examples/containerservice/v1beta1/kubernetescluster.yaml - same as main.
https://github.com/crossplane-contrib/provider-upjet-azure/actions/runs/10682720219
Relevant Error Output Snippet
No response
Crossplane Version
main, whatever is used by the uptests pipeline
Provider Version
main
Kubernetes Version
No response
Kubernetes Distribution
No response
Additional Info
I have been trying to merge in my PR but the uptests keep failing, even though manual tests succeed with the newly added resource. Because the resource relies on an AKS cluster present in the cloud I ran the pipeline with AKS defined in the CR and noticed instability, several unsuccesful runs where the cluster was stuck in Creating state throughout the 30 minute run.
The text was updated successfully, but these errors were encountered: