Add ChatQnA megaservice E2E (frontend) metric based autoscaling support #866

eero-t · 2025-03-10T10:19:53Z

Description

Add HPA scaling support also for ChatQnA megaservice/frontend service
Add frontend metric based scaling option to all 5 HPA scaled components

Additional ChatQnA values file can be used to apply frontend metric based scaling to all HPA controlled components. It should be on top of the base HPA values file.

Custom metrics are provided for all components that have HPA enabled, even if they've been configured to use frontEndMetrics. That way user can easily change their scaling between frontend and backend metrics by re-installing Helm chart (because Prometheus-adapter custom metrics configMap does not change, its manual install step can be skipped).

Issues

n/a.

Type of change

New feature (non-breaking change which adds new functionality)

Dependencies

Manual testing with this revealed issue with the E2E metric used for scaling, which needs to be fixed first: opea-project/GenAIComps#1121

Tests

Manual testing that HPA scaling works based on frontend metric.

- Add HPA scaling support also for ChatQnA megaservice/frontend service - Add frontend metric based scaling option to all 5 HPA scaled components Additional ChatQnA values file can be used to apply frontend metric based scaling to all HPA controlled components. It should be on top of the base HPA values file. Custom metrics are provided for all components that have HPA enabled, even if they've been configured to use frontEndMetrics. That way user can easily change their scaling between frontend and backend metrics by re-installing Helm chart (because Prometheus-adapter custom metrics configMap does not change, its manual install step can be skipped). Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

eero-t · 2025-03-10T10:20:45Z

Marked as draft, until dependent "GenAIComps" metrics issue is fixed, and I've updated HPA doc.

eero-t · 2025-03-10T10:39:35Z

All chatqna tests and 2 vllm failed, to same chatqna-ui image manfest issue in CI (unrelated to changes in this PR):

+ for img in `helm template -n $NAMESPACE -f helm-charts//chatqna/${value_file} $RELEASE_NAME helm-charts//chatqna | grep 'image:' | grep 'opea/' | awk '{print $2}' | xargs`
+ .github/workflows/scripts/e2e/chart_test.sh check_local_opea_image 100.80.243.74:5000/opea/chatqna-ui:latest
Failed to get image manifest 100.80.243.74:5000/opea/chatqna-ui:latest
+ echo skip_validate=true
+ echo should_cleanup=false
+ exit 1
Error: Process completed with exit code 1.

eero-t · 2025-03-10T13:03:11Z

Several additional inferencing engine CI failures were due to namespace deletion timing out:

namespace "infra-tei-10102122" deleted
...
namespace "infra-tei-10102122" force deleted
error: timed out waiting for the condition on namespaces/infra-tei-10102122
Error: Process completed with exit code 1.

Namespaces can easily end up in a state where they cannot be deleted, either because deletion was done in wrong order, or for wrong objects and/or due to k8s object dependencies.

Example of that is removing namespaced deployment (e.g. prometheus-adapter) providing a non-namespaced k8s API endpoint (e.g. k8s custom metrics), without deleting the API endpoint. Namespace will go away only after API endpoint is also removed.

Another option (listed e.g. in StackOverflow) is forcibly removing namespace finalizer through k8s API server JSON calls, after which namespace will be removed. However, that leaves cluster in a subtly broken state (in my example case, API endpoint has then no backend).

eero-t requested review from yongfengdu and lianhao as code owners March 10, 2025 10:19

eero-t marked this pull request as draft March 10, 2025 10:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ChatQnA megaservice E2E (frontend) metric based autoscaling support #866

Add ChatQnA megaservice E2E (frontend) metric based autoscaling support #866

eero-t commented Mar 10, 2025

eero-t commented Mar 10, 2025 •

edited

Loading

eero-t commented Mar 10, 2025 •

edited

Loading

eero-t commented Mar 10, 2025

Add ChatQnA megaservice E2E (frontend) metric based autoscaling support #866

Are you sure you want to change the base?

Add ChatQnA megaservice E2E (frontend) metric based autoscaling support #866

Conversation

eero-t commented Mar 10, 2025

Description

Issues

Type of change

Dependencies

Tests

eero-t commented Mar 10, 2025 • edited Loading

eero-t commented Mar 10, 2025 • edited Loading

eero-t commented Mar 10, 2025

eero-t commented Mar 10, 2025 •

edited

Loading

eero-t commented Mar 10, 2025 •

edited

Loading