Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the "Value" metric type in all scalers #2309

Merged
merged 17 commits into from
Mar 30, 2022

Conversation

amirschw
Copy link
Contributor

@amirschw amirschw commented Nov 21, 2021

Add support for ValueMetricType in all scalers except cpu/memory by introducing a new optional metricType field to the ScaleTrigger spec. The field will default to AverageValueMetricType to maintain backward compatibility.

Notes:

  • The Utilization metric type is not supported for external metrics, only Value or AverageValue.
  • The CPU/memory scalers will continue to support the Utilization and AverageValue metric types.
  • The fallback capability will continue to be supported for the AverageValue metric type only.

Checklist

Fixes #2030.

Signed-off-by: amirschw 24677563+amirschw@users.noreply.github.com

Signed-off-by: amirschw <24677563+amirschw@users.noreply.github.com>
@JorTurFer
Copy link
Member

JorTurFer commented Nov 24, 2021

/run-e2e
Update: You can check the progres here

@JorTurFer
Copy link
Member

JorTurFer commented Nov 25, 2021

/run-e2e mongodb.test*
Update: You can check the progres here

@JorTurFer
Copy link
Member

JorTurFer commented Nov 25, 2021

/run-e2e azure-queue-trigger-auth*

Update: You can check the progres here

@JorTurFer
Copy link
Member

JorTurFer commented Nov 29, 2021

/run-e2e mongodb.test*
Update: You can check the progres here

@JorTurFer
Copy link
Member

JorTurFer commented Dec 1, 2021

/run-e2e mongodb.test*
Update: You can check the progres here

@JorTurFer
Copy link
Member

hi @amirschw
In general, looks good! Thanks for it ❤️
My only concert is that it's a big change, and we should try it well. Let's wait for other opinions

@zroubalik
Copy link
Member

@amirschw thanks for this proposal. So you decided to set the metric type on the ScaledObject level, what if the type could be specified on the trigger level? (each ScaledObject/Job could have mutliple scalers).

@amirschw
Copy link
Contributor Author

amirschw commented Dec 2, 2021

@zroubalik I actually set the metric type on the trigger/scaler level as proposed in #2030, not on the the ScaledObject level (or at least I think I did... 😄).

See here:

type ScaleTriggers struct {
Type string `json:"type"`
// +optional
Name string `json:"name,omitempty"`
Metadata map[string]string `json:"metadata"`
// +optional
AuthenticationRef *ScaledObjectAuthRef `json:"authenticationRef,omitempty"`
// +optional
FallbackReplicas *int32 `json:"fallback,omitempty"`
// +optional
MetricType autoscalingv2beta2.MetricTargetType `json:"metricType,omitempty"`
}

And here:

config := &scalers.ScalerConfig{
Name: withTriggers.Name,
Namespace: withTriggers.Namespace,
TriggerMetadata: trigger.Metadata,
ResolvedEnv: resolvedEnv,
AuthParams: make(map[string]string),
GlobalHTTPTimeout: h.globalHTTPTimeout,
ScalerIndex: scalerIndex,
MetricType: trigger.MetricType,
}

@zroubalik
Copy link
Member

zroubalik commented Dec 2, 2021

oh man 🤦 😄 I don't know why I thought that.

I have been quite busy recently, I will try to follow up on your PR ASAP. But I think that generally it is ok.

@amirschw
Copy link
Contributor Author

amirschw commented Dec 2, 2021

No worries at all :) When you do get a chance to follow up, I will be happy to get your feedback about the fourth comment in the PR description (regarding how do we make sure it works correctly for ScaledJobs).

@pavlospt
Copy link

pavlospt commented Dec 6, 2021

@amirschw Hello, we stumbled upon a business case today and we thought about contributing this change, until we found that you had a draft PR! Since every trigger on a ScaledObject can be of different type, why should this be on the ScaledObject level and on the metadata level of the trigger? Have you thought of some specific use-case?

If the metricType is top level on ScaleTriggers for semantic reasons, shouldn't the targetMetricValue fo there as well?

@amirschw
Copy link
Contributor Author

amirschw commented Dec 7, 2021

Hi @pavlospt!

Since every trigger on a ScaledObject can be of different type, why should this be on the ScaledObject level and on the metadata level of the trigger? Have you thought of some specific use-case?

Not sure I got your question. I added the new metricType field as part of each trigger in the ScaledObject (or ScaledJob). This is exactly because every trigger on a ScaledObject can be of different type.

If the metricType is top level on ScaleTriggers for semantic reasons, shouldn't the targetMetricValue fo there as well?

Good question. I think it might be too big of a change to move the target value from the trigger metadata to reside directly under the trigger, especially since every trigger currently has a different name for the target value in its metadata, based on the scaler logic (e.g., queueLength for Azure Storage Queue, threshold for Prometheus, etc.). Will be happy to hear other opinions here.

@pavlospt
Copy link

pavlospt commented Dec 7, 2021

Hi @pavlospt!

Since every trigger on a ScaledObject can be of different type, why should this be on the ScaledObject level and on the metadata level of the trigger? Have you thought of some specific use-case?

Not sure I got your question. I added the new metricType field as part of each trigger in the ScaledObject (or ScaledJob). This is exactly because every trigger on a ScaledObject can be of different type.

Yeah please ignore this part, my main concern is the following question!

If the metricType is top level on ScaleTriggers for semantic reasons, shouldn't the targetMetricValue fo there as well?

Good question. I think it might be too big of a change to move the target value from the trigger metadata to reside directly under the trigger, especially since every trigger currently has a different name for the target value in its metadata, based on the scaler logic (e.g., queueLength for Azure Storage Queue, threshold for Prometheus, etc.). Will be happy to hear other opinions here.

I understand the concers, maybe that could be a future improvement, but just for semantic reasons the fact that a type for the value is introduced, this should mean that the value as well now represents a "core" part of a trigger. It still remains a value independently of how any scaler treats it and its type is now configurable based on v2beta2.autoscaling!

Just my 2 cents on that since we were discussing it with my colleagues today 😃

@amirschw
Copy link
Contributor Author

amirschw commented Dec 7, 2021

@pavlospt I see your point and agree it's worth considering it as a possible future improvement (KEDA v3? 😃)

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Sorry for the delay in a review.

ad 3. Could you please describe the issue you have with Fallback?

ad 4. You are right, we might probably end up with not supporting this setting for ScaledJob 🤔 ? @TsuyoshiUshio wdyt?

@amirschw
Copy link
Contributor Author

amirschw commented Dec 8, 2021

Thanks for the review @zroubalik!

Regarding fallback: the cool thing about AverageValue is that we don't need to know how many replicas are available in order to scale to the value of "fallback replicas". If, for example, the target average value = 100, and fallback replicas = 5, then the scaler will return 500 (100 * 5) for the metric value based on this formula. And then, since we're using AverageValue, the HPA controller will scale according to the algorithm to 5 replicas: ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )] = ceil[currentReplicas * ( (500 / currentReplicas) / 100 )] = ceil(500 / 100) = 5.

When Value is used, I don't think there's a way to scale to X fallback replicas without knowing how many replicas are currently available. This is probably why the isFallbackEnabled function checks that the metric type is AverageValueMetricType.

@zroubalik
Copy link
Member

@amirschw gotcha. We can try to pull info from the ScaleTarget in this case, to get the number of replicas. Will that help you with modification of the fallback algorithm?

func (e *scaleExecutor) getScaleTargetScale(ctx context.Context, scaledObject *kedav1alpha1.ScaledObject) (*autoscalingv1.Scale, error) {
return e.scaleClient.Scales(scaledObject.Namespace).Get(ctx, scaledObject.Status.ScaleTargetGVKR.GroupResource(), scaledObject.Spec.ScaleTargetRef.Name, metav1.GetOptions{})
}

@zroubalik
Copy link
Member

zroubalik commented Mar 29, 2022

@amirschw following tests:

scalers/rabbitmq-queue-http.test.ts
scalers/rabbitmq-queue-http-regex-vhost.test.ts
scalers/rabbitmq-queue-http-regex.test.ts

failed with:

 ***.648543265835394***e+09	ERROR	controller.scaledobject	Failed to create new HPA in cluster	***"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "name": "test-scaledobject", "namespace": "rabbitmq-queue-http-regex-vhost-test", "HPA.Namespace": "rabbitmq-queue-http-regex-vhost-test", "HPA.Name": "keda-hpa-test-scaledobject", "error": "HorizontalPodAutoscaler.autoscaling \"keda-hpa-test-scaledobject\" is invalid: [spec.metrics[0].external.target.type: Required value: must specify a metric target type, spec.metrics[0].external.target.type: Invalid value: \"\": must be either Utilization, Value, or AverageValue]"***
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).ensureHPAForScaledObjectExists
	/workspace/controllers/keda/scaledobject_controller.go:36***
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).reconcileScaledObject
	/workspace/controllers/keda/scaledobject_controller.go:229
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).Reconcile
	/workspace/controllers/keda/scaledobject_controller.go:***80
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.***.0/pkg/internal/controller/controller.go:***4
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.***.0/pkg/internal/controller/controller.go:3***
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.***.0/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.***.0/pkg/internal/controller/controller.go:227
***.648543265835483e+09	ERROR	controller.scaledobject	Failed to ensure HPA is correctly created for ScaledObject	***"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "name": "test-scaledobject", "namespace": "rabbitmq-queue-http-regex-vhost-test", "error": "HorizontalPodAutoscaler.autoscaling \"keda-hpa-test-scaledobject\" is invalid: [spec.metrics[0].external.target.type: Required value: must specify a metric target type, spec.metrics[0].external.target.type: Invalid value: \"\": must be either Utilization, Value, or AverageValue]"***

@zroubalik
Copy link
Member

zroubalik commented Mar 29, 2022

/run-e2e
Update: You can check the progres here

Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Really, thanks for the huge effort that you have done with this PR! ❤️ ❤️

@JorTurFer
Copy link
Member

JorTurFer commented Mar 29, 2022

/run-e2e rabbit*
Update: You can check the progres here

Signed-off-by: amirschw <24677563+amirschw@users.noreply.github.com>
@JorTurFer
Copy link
Member

JorTurFer commented Mar 29, 2022

@amirschw , really, KUDOS to you. You have done a great (and huge) work with this PR 🙇
I'd like to help you with the rabbitmq problem. I have debugged it and there is one change missing, RabbitMQ Scaler has two different places where the scaler is built and you have changed only one of them.
You have changed this, but you need to update with the metricType this too

Thanks again for this awesome contribution

Only one more thing, could you rebase main to fix the changelog conflicts? That will also pull new AWS e2e test, and we can retrigger e2e test again testing those scalers also 🙏

@zroubalik
Copy link
Member

zroubalik commented Mar 29, 2022

RabbitMQ Scaler has to different places where the scaler is built and you have changed only one of them.
You have changed this, but you need to update with the metricType this too

Would be great if you could refactor that part of the code, so we create just on struct rabbitMQScaler{} at one place and returning it just once. So we prevent similar mistake again. Thanks!

Signed-off-by: amirschw <24677563+amirschw@users.noreply.github.com>
@amirschw
Copy link
Contributor Author

@zroubalik, @JorTurFer thanks! I was just debugging the rabbitmq failure myself and pushed a change without checking your comments. Will address your latest comments shortly 😄

Signed-off-by: amirschw <24677563+amirschw@users.noreply.github.com>
Signed-off-by: amirschw <24677563+amirschw@users.noreply.github.com>
Signed-off-by: amirschw <24677563+amirschw@users.noreply.github.com>
Signed-off-by: amirschw <24677563+amirschw@users.noreply.github.com>
@zroubalik
Copy link
Member

zroubalik commented Mar 30, 2022

/run-e2e
Update: You can check the progres here

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

/hold for @JorTurFer for the final review

thanks a lot!

@zroubalik
Copy link
Member

zroubalik commented Mar 30, 2022

We should open an issue to remove deprecated fields in the CPU/Memory scaler sometime in the future, once this get merged.

@amirschw
Copy link
Contributor Author

We should open an issue to remove deprecated fields in the CPU/Memory scaler sometime in the future, once this get merged.

Good idea. I opened #2844 for this.

@JorTurFer
Copy link
Member

LGTM,

/hold for @JorTurFer for the final review

thanks a lot!

LGTM too!
Thanks for the awesome improvement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for the "Value" metric type in addition to "AverageValue"
8 participants