-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running Radius on AKS Automatic causes Node failures #7676
Comments
👋 @loekd Thanks for filing this bug report. A project maintainer will review this report and get back to you soon. If you'd like immediate help troubleshooting, please visit our Discord server. For more information on our triage process please visit our triage overview |
👍 We've reviewed this issue and have agreed to add it to our backlog. Please subscribe to this issue for notifications, we'll provide updates when we pick it up. We also welcome community contributions! If you would like to pick this item up sooner and submit a pull request, please visit our contribution guidelines and assign this to yourself by commenting "/assign" on this issue. For more information on our triage process please visit our triage overview |
We've prioritized work on this issue. Please subscribe to this issue for notifications, we'll provide updates as we make progress. We also welcome community contributions! If you would like to pick this item up sooner and submit a pull request, please visit our contribution guidelines and assign this to yourself by commenting "/assign" on this issue. For more information on our triage process please visit our triage overview |
The likely cause of this is that the UCP APIServer Extension went down. We're using a Kubernetes extensibility point that's pretty heavyweight, and can cause issues like this. The reasons why we chose that approach aren't really true any more and we should migrate away. A better approach would be for us to port-forward to the control-plane instead of exposing it through the API Server. This would still require the user to have Kubernetes credentials (that's good) but would also be more flexible because users could expose the control-plane in whatever manner they like. |
/assign @brooke-hamilton |
@brooke-hamilton - if you're interested in learning more about this approach, this is what the ArgoCD CLI does. Argo's control-plane is running inside the cluster, and (by default) they port-forward so the CLI can talk to it. https://github.com/argoproj/argo-cd/blob/master/pkg/apiclient/apiclient.go#L206 |
I opened this issue as related. Azure/AKS#4513. |
I believe this behavior is not related to Radius specifically. I was able to reproduce node crashes by deploying a single (non-Radius) pod to an AKS auto cluster and scaling it up to 100 instances. @loekd please comment if there is any other context we should consider. Thank you for reporting the issue! 🚀 |
Steps to reproduce
Observed behavior
Desired behavior
Workaround
rad Version
Operating system
Ubuntu (default node)
Additional context
No response
Would you like to support us?
AB#12487
The text was updated successfully, but these errors were encountered: