-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI tests failing for flux integration #5827
Comments
Updating to be able to view the logs of the failed flux ci test (#5841 ) shows
The NotFound error appears to be returned by the shared kubeapps/cmd/kubeapps-apis/plugins/pkg/resourcerefs/resourcerefs.go Lines 111 to 113 in fa761a0
which implies that the helm command that was run to get the helm release manifest for |
More debugging via logs shows:
so at this point I can only assume that the error is indeed correct: the helm release does not exist. So I need to check the flux controllers and logs to see why. Trying to reproduce locally, I notice that six flux controllers are installed, when I think we only need 2, this could save us some CPU usage:
I can scale three of those down to zero and still deploy the flux helm release without issue:
If I scale the helm-controller to zero, I reproduce the issue seen in CI both visually, as well in the logs:
since the helm release is never created. EDIT: Actually, this is different to what CI sees visually, in that I see that the helm release is not created, but in my case it's because Kubeapps created the flux
|
Woo - so printing the logs of flux's helm-controller reveals:
So in the end, it was an update of the podinfo chart, since we don't deploy a specific version. I didn't see this locally since we use 1.24 in the dev environment. To avoid the time required for investigation next time, I recommend we use a specific version of podinfo for tests (so we control when it updates). |
Signed-off-by: Michael Nelson <minelson@vmware.com> <!-- Before you open the request please review the following guidelines and tips to help it be more easily integrated: - Describe the scope of your change - i.e. what the change does. - Describe any known limitations with your change. - Please run any tests or examples that can exercise your modified code. Thank you for contributing! --> ### Description of the change The main update is to switch to k8s 1.24.7 in our CI (and dev) environments, because CI was running with 1.22 and the podinfo chart that the flux tests use to test is no longer compatible with 1.22. As a result, we also needed to create specific service token secrets and update the way CI gets those. There's also some extra logging to help trace issues in the future. <!-- Describe the scope of your change - i.e. what the change does. --> ### Benefits <!-- What benefits will be realized by the code change? --> CI passes again. ### Possible drawbacks <!-- Describe any known limitations with your change --> ### Applicable issues <!-- Enter any applicable Issues here (You can reference an issue using #) --> - fixes #5827 ### Additional information <!-- If there's anything else that's important and relevant to your pull request, mention that information here.--> Signed-off-by: Michael Nelson <minelson@vmware.com>
Summary
A few commits ago, the CI tests for our main branch began failing. The last successful CI test of main was when #5798 landed, syncing the upstream chart changes for the 12.1.3 chart release.
At the time of writing, there are only three commits on main since, all of which failed once landed (though each PR passed before landing).
The flux test fails while waiting for the deployed package to show pods (for the pie chart to be populate):
but the screenshot shows that there's nothing available by the timeout:
(creating the issue to document what's known right now as I won't get to this before the break).
The text was updated successfully, but these errors were encountered: