Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ChatQnA megaservice E2E (frontend) metric based autoscaling support #866

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

eero-t
Copy link
Contributor

@eero-t eero-t commented Mar 10, 2025

Description

  • Add HPA scaling support also for ChatQnA megaservice/frontend service
  • Add frontend metric based scaling option to all 5 HPA scaled components

Additional ChatQnA values file can be used to apply frontend metric based scaling to all HPA controlled components. It should be on top of the base HPA values file.

Custom metrics are provided for all components that have HPA enabled, even if they've been configured to use frontEndMetrics. That way user can easily change their scaling between frontend and backend metrics by re-installing Helm chart (because Prometheus-adapter custom metrics configMap does not change, its manual install step can be skipped).

Issues

n/a.

Type of change

  • New feature (non-breaking change which adds new functionality)

Dependencies

Manual testing with this revealed issue with the E2E metric used for scaling, which needs to be fixed first: opea-project/GenAIComps#1121

Tests

Manual testing that HPA scaling works based on frontend metric.

- Add HPA scaling support also for ChatQnA megaservice/frontend service
- Add frontend metric based scaling option to all 5 HPA scaled components

Additional ChatQnA values file can be used to apply frontend metric
based scaling to all HPA controlled components. It should be on top
of the base HPA values file.

Custom metrics are provided for all components that have HPA enabled,
even if they've been configured to use frontEndMetrics. That way user
can easily change their scaling between frontend and backend metrics
by re-installing Helm chart (because Prometheus-adapter custom metrics
configMap does not change, its manual install step can be skipped).

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
@eero-t eero-t marked this pull request as draft March 10, 2025 10:20
@eero-t
Copy link
Contributor Author

eero-t commented Mar 10, 2025

Marked as draft, until dependent "GenAIComps" metrics issue is fixed, and I've updated HPA doc.

@eero-t
Copy link
Contributor Author

eero-t commented Mar 10, 2025

All chatqna tests and 2 vllm failed, to same chatqna-ui image manfest issue in CI (unrelated to changes in this PR):

+ for img in `helm template -n $NAMESPACE -f helm-charts//chatqna/${value_file} $RELEASE_NAME helm-charts//chatqna | grep 'image:' | grep 'opea/' | awk '{print $2}' | xargs`
+ .github/workflows/scripts/e2e/chart_test.sh check_local_opea_image 100.80.243.74:5000/opea/chatqna-ui:latest
Failed to get image manifest 100.80.243.74:5000/opea/chatqna-ui:latest
+ echo skip_validate=true
+ echo should_cleanup=false
+ exit 1
Error: Process completed with exit code 1.

@eero-t
Copy link
Contributor Author

eero-t commented Mar 10, 2025

Several additional inferencing engine CI failures were due to namespace deletion timing out:

namespace "infra-tei-10102122" deleted
...
namespace "infra-tei-10102122" force deleted
error: timed out waiting for the condition on namespaces/infra-tei-10102122
Error: Process completed with exit code 1.

Namespaces can easily end up in a state where they cannot be deleted, either because deletion was done in wrong order, or for wrong objects and/or due to k8s object dependencies.

Example of that is removing namespaced deployment (e.g. prometheus-adapter) providing a non-namespaced k8s API endpoint (e.g. k8s custom metrics), without deleting the API endpoint. Namespace will go away only after API endpoint is also removed.

Another option (listed e.g. in StackOverflow) is forcibly removing namespace finalizer through k8s API server JSON calls, after which namespace will be removed. However, that leaves cluster in a subtly broken state (in my example case, API endpoint has then no backend).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant