Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for scale subresource to Connect, S2I, MM1, MM2, Bridge and Connectors #3165

Merged
merged 16 commits into from
Jun 11, 2020

Conversation

scholzj
Copy link
Member

@scholzj scholzj commented Jun 6, 2020

Type of change

  • Enhancement / new feature

Description

This PR adds support for scale subresource. Rght now it is added to following resources:

  • KafkaConnect, KafkaConnectS2I and KafkaMirrorMaker2
  • KafkaMirrorMaker
  • KafkaBridge
  • KafkaConnector

The scale subresource is useful because it makes it easier to scale the resources and for example use different tools for autoscaling on them.

To allow support for the scale subresource, some new status fields have been added. One is a field which mirrors the corresponding number of replicas. Other one is label selector which is important for use with Horizontal Pod Autoscalers.

For all Deployment based resources, this is done by a new field .status.replicas which mirrors .spec.replicas and .status.podSelector which can be used to select the pods. KafkaConnector is an example of scaling a resource not represented by pods. In this case, the label selector is not present and the .spec.tasksMax and .status.tasksMax represent the replicas field.

The tests include a test for the CRD generator with CRD including status and scale subresources, tests of the api module that test that the scale subresource works and Cluster Operator tests testing the new status fields. To make the api tests pass I had to update kubectl to 1.16 (1.15 has a bug kubernetes/kubernetes#81342 due to which it doesn't work).

This PR also fixes the flaky test KafkaCrdIT.testKafkaWithTemplate which was causing problems while testing this.

Checklist

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Update CHANGELOG.md

@scholzj scholzj added this to the 0.19.0 milestone Jun 6, 2020
@scholzj scholzj requested a review from tombentley June 6, 2020 23:09
@scholzj
Copy link
Member Author

scholzj commented Jun 6, 2020

@tombentley Before adding this to more resources and writting tests, I wondered what do you think abotu the code and whether you have any early comments.

Copy link
Member

@tombentley tombentley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, what metrics do you think an autoscaler for connectors would use to decide how to scale, since it's Kube has no way to attribute CPU etc to a particular connector within a multi-connector cluster?

@scholzj scholzj force-pushed the support-scale-subresource-in-our-crds branch from 9968ff7 to 6a5727d Compare June 8, 2020 21:47
Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much do you think that the scale subresource is useful for the bridge? We know that actually it's not so simple scaling the bridge and we always suggest to use different deployments.

Copy link
Contributor

@samuel-hawker samuel-hawker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the other maintainers concerns this looks good to me!

@scholzj
Copy link
Member Author

scholzj commented Jun 9, 2020

How much do you think that the scale subresource is useful for the bridge? We know that actually it's not so simple scaling the bridge and we always suggest to use different deployments.

It could be actually quite useful when using the bridge for producing messages for example, or? That might be even something where you can use the usual autoscaling mechanisms such as CPU utilization etc.

This PR doesn't really do anything else than making it possible for the users to use the scale subresource and nothing more. That is something what the user can already do by editing the resource anyway, this is just an easier way. As with any autoscaling, the user needs to plug it in manually and think about the autoscaling rules. So I think only future will show how useful or useless this is depending on whether users find some good way to use it, but I think there is only a little harm.

@ppatierno
Copy link
Member

Yes, good point. The bridge scaling problem is mostly related to consumers; using scaling for producer could make sense.
Other than CPU, if we provide HTTP metrics out of the bridge, they could be used for scaling as well (of course considering the PUT ones for sending messages on specific endpoints). I was wondering how could be interesting integration with KEDA for this.
I will take a look, it's interesting stuff for me :-)

@scholzj
Copy link
Member Author

scholzj commented Jun 9, 2020

Other than CPU, if we provide HTTP metrics out of the bridge, they could be used for scaling as well (of course considering the PUT ones for sending messages on specific endpoints). I was wondering how could be interesting integration with KEDA for this.

Yeah ... we should have some metrics in the bridge as well. But that is missing right now. As for KEDA, I need to check with them, but from the docs my udnerstanding is that today, KEDA does not support scaling custom resources and supports only scaling of deployments. So ideally once we have this ready, we should convince them to add support for scaling any resource and not just Deployments.

@ppatierno
Copy link
Member

@scholzj I had a chat with KEDA guys. They have the feature planned for KEDA v2 (in one month or so) kedacore/keda#703.
I will plan to work on adding metrics on the bridge for this.

@scholzj
Copy link
Member Author

scholzj commented Jun 9, 2020

@ppatierno Right, this is exactly what would make these changes a bit more useful.

scholzj and others added 6 commits June 10, 2020 14:58
…d KafkaConnector

Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>

Co-authored-by: Tom Bentley <tombentley@users.noreply.github.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
@scholzj scholzj force-pushed the support-scale-subresource-in-our-crds branch from 745ddd6 to 5587b10 Compare June 10, 2020 14:04
…ces, try to fix the CRD tests on Minikube

Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
…es bug #81342

Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
@scholzj scholzj changed the title Add support for scale subresource Add support for scale subresource to Connect, S2I, MM1, MM2, Bridge and Connectors Jun 10, 2020
@scholzj scholzj marked this pull request as ready for review June 10, 2020 19:45
@scholzj
Copy link
Member Author

scholzj commented Jun 10, 2020

@ppatierno @samuel-hawker @tombentley I added tests etc. and this should now be ready for final review.

@scholzj
Copy link
Member Author

scholzj commented Jun 10, 2020

/azp run system-tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Jakub Scholz <www@scholzj.com>
@scholzj
Copy link
Member Author

scholzj commented Jun 10, 2020

@strimzi-ci run tests

@strimzi-ci
Copy link

❌ Test Summary ❌

TEST_PROFILE: acceptance
EXCLUDED_GROUPS: networkpolicies
TEST_CASE: *ST
TOTAL: 25
PASS: 9
FAIL: 16
SKIP: 0
BUILD_NUMBER: 1073
BUILD_ENV: oc cluster up

❗ Test Failures ❗

  • io.strimzi.systemtest.bridge.HttpBridgeTlsST in io.strimzi.systemtest.bridge.HttpBridgeTlsST
  • io.strimzi.systemtest.oauth.OauthTlsST in io.strimzi.systemtest.oauth.OauthTlsST
  • testKafkaAndZookeeperScaleUpScaleDown in io.strimzi.systemtest.rollingupdate.RollingUpdateST
  • testAutoRenewAllCaCertsTriggeredByAnno in io.strimzi.systemtest.security.SecurityST
  • testCustomSoloCertificatesForRoute in io.strimzi.systemtest.kafka.ListenersST
  • testNodePortTls in io.strimzi.systemtest.KafkaST
  • testSendMessagesPlainScramSha in io.strimzi.systemtest.KafkaST
  • testLoadBalancerTls in io.strimzi.systemtest.KafkaST
  • io.strimzi.systemtest.RecoveryST in io.strimzi.systemtest.RecoveryST
  • testMirrorMaker2TlsAndTlsClientAuth in io.strimzi.systemtest.MirrorMaker2ST
  • testKafkaConnectorWithConnectS2IAndConnectWithSameName in io.strimzi.systemtest.ConnectS2IST
  • testProducerConsumerStreamsService in io.strimzi.systemtest.tracing.TracingST
  • testMirrorMakerTlsAuthenticated in io.strimzi.systemtest.MirrorMakerST
  • testUpdateUser in io.strimzi.systemtest.UserST
  • testMultiNodeKafkaConnectWithConnectorCreation in io.strimzi.systemtest.ConnectST
  • testKafkaConnectAndConnectorFileSinkPlugin in io.strimzi.systemtest.ConnectST

Re-run command:
@strimzi-ci run tests false profile=acceptance testcase=io.strimzi.systemtest.bridge.HttpBridgeTlsST,io.strimzi.systemtest.oauth.OauthTlsST,io.strimzi.systemtest.rollingupdate.RollingUpdateST#testKafkaAndZookeeperScaleUpScaleDown,io.strimzi.systemtest.security.SecurityST#testAutoRenewAllCaCertsTriggeredByAnno,io.strimzi.systemtest.kafka.ListenersST#testCustomSoloCertificatesForRoute,io.strimzi.systemtest.KafkaST#testNodePortTls,io.strimzi.systemtest.KafkaST#testSendMessagesPlainScramSha,io.strimzi.systemtest.KafkaST#testLoadBalancerTls,io.strimzi.systemtest.RecoveryST,io.strimzi.systemtest.MirrorMaker2ST#testMirrorMaker2TlsAndTlsClientAuth,io.strimzi.systemtest.ConnectS2IST#testKafkaConnectorWithConnectS2IAndConnectWithSameName,io.strimzi.systemtest.tracing.TracingST#testProducerConsumerStreamsService,io.strimzi.systemtest.MirrorMakerST#testMirrorMakerTlsAuthenticated,io.strimzi.systemtest.UserST#testUpdateUser,io.strimzi.systemtest.ConnectST#testMultiNodeKafkaConnectWithConnectorCreation,io.strimzi.systemtest.ConnectST#testKafkaConnectAndConnectorFileSinkPlugin

@scholzj
Copy link
Member Author

scholzj commented Jun 11, 2020

/azp run system-tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj scholzj merged commit 3a06fbd into strimzi:master Jun 11, 2020
@scholzj scholzj deleted the support-scale-subresource-in-our-crds branch June 11, 2020 20:42
@exherb
Copy link

exherb commented Aug 2, 2020

{
"kind": "Scale",
"apiVersion": "autoscaling/v1",
"metadata": {
"name": "kafka-connect-xapi",
"namespace": "xapi",
"selfLink": "/apis/kafka.strimzi.io/v1beta1/namespaces/xapi/kafkaconnects/kafka-connect-xapi/scale",
"uid": "f9055528-79af-4012-8026-2ac80d3b81ff",
"resourceVersion": "204389134",
"creationTimestamp": "2020-07-31T10:27:37Z"
},
"spec": {
"replicas": 3
},
"status": {
"replicas": 3
}
}

status.selector has no value?

@scholzj
Copy link
Member Author

scholzj commented Aug 2, 2020

@exherb Hmm, looks like in the actual resource it is named .status.podSelector ... but in the CRD .status.selector 🤦‍♂️ ... I opened #3432 to fix it.

As a workaround, you should be able to to replace the labelSelectorPath: .status.selector for labelSelectorPath: .status.podSelector and repply the CRDs.

klalafaryan pushed a commit to klalafaryan/strimzi-kafka-operator that referenced this pull request Oct 21, 2020
…nd Connectors (strimzi#3165)

* Add support for scale subresource in KafkaConnect, its derivatives and KafkaConnector

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Apply suggestions from code review

Signed-off-by: Jakub Scholz <www@scholzj.com>

Co-authored-by: Tom Bentley <tombentley@users.noreply.github.com>

* Review comments

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Add scaling to Bridge and MM

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Make existing tests pass and fix some review comments

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Add CRD tests

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Regen of APi reference after rebase

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Add more tests for setitng the right statuses in the different resources, try to fix the CRD tests on Minikube

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Fix unused imports and improve CHANGELOG.md

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Travis seems to be too fast?

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Try to fix Travis race conditions

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Try to fix Travis race conditions II

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Debug Travis

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Bump kubectl to 1.16.0 since the issue seems ot be caused by Kubernetes bug #81342

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Cleanup previous debug and fix attempts

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Use better name in waitFor method

Signed-off-by: Jakub Scholz <www@scholzj.com>

Co-authored-by: Tom Bentley <tombentley@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants