Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration test fail intermittently in GH runners #174

Open
i-chvets opened this issue Sep 15, 2023 · 0 comments
Open

Integration test fail intermittently in GH runners #174

i-chvets opened this issue Sep 15, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@i-chvets
Copy link
Contributor

i-chvets commented Sep 15, 2023

Bug Description

When executing Kserve integration tests with additional server the tests started to fail intermittently. In 90% of the cases test could not complete.
This PR exposed this issue.

Failed run:
https://github.com/canonical/kserve-operators/actions/runs/6187990896

Successful run:
https://github.com/canonical/kserve-operators/actions/runs/6190038075

From initial investigation it looks like that there are not enough resouces in GH runner to complete tests:

 test_charm:test_charm.py:308 mlserver-sklearn-iris is not ready {'lastTransitionTime': '2023-09-14T19:56:19Z', 'message': 'Revision "mlserver-sklearn-iris-predictor-default-00001" failed with message: 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..', 'reason': 'RevisionFailed', 'severity': 'Info', 'status': 'False', 'type': 'PredictorConfigurationReady'}

Setting limits.cpu: 250m and deleting/asserting deletion of test resouces solved the issue (see above PR).

The resouce limitation is not affecting just Kserve. It has been observed in other repositories as well.

To Reproduce

Trigger pull request workflow.

Environment

GH runners

Relevant Log Output

# K8S logs:

    CONTROLLER_NAME: github-pr-29756-microk8s
NAMESPACE         LAST SEEN   TYPE      REASON                    OBJECT                              MESSAGE
default           5m14s       Warning   FreeDiskSpaceFailed       node/fv-az42-917                    failed to garbage collect required amount of images. Wanted to free 11716064051 bytes, but freed 317164 bytes
default           5m14s       Warning   ImageGCFailed             node/fv-az42-917                    failed to garbage collect required amount of images. Wanted to free 11716064051 bytes, but freed 317164 bytes
knative-serving   3m30s       Warning   FailedGetResourceMetric   horizontalpodautoscaler/activator   failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
knative-serving   3m27s       Warning   FailedGetResourceMetric   horizontalpodautoscaler/webhook     failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
test-charm-sa6s   2m9s        Warning   FailedGetResourceMetric   horizontalpodautoscaler/istiod      failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
default           2s          Warning   FreeDiskSpaceFailed       node/fv-az42-917                    failed to garbage collect required amount of images. Wanted to free 16135041843 bytes, but freed 293648095 bytes
default           1s          Warning   ImageGCFailed             node/fv-az42-917                    failed to garbage collect required amount of images. Wanted to free 16135041843 bytes, but freed 293648095 bytes

Additional Context

N/A

@i-chvets i-chvets added the bug Something isn't working label Sep 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant