-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MGDSTRM-10325: Delete Strimzi cluster operator leadership lease on deployment deletion #851
MGDSTRM-10325: Delete Strimzi cluster operator leadership lease on deployment deletion #851
Conversation
operator/src/test/java/org/bf2/operator/managers/StrimziManagerTest.java
Outdated
Show resolved
Hide resolved
118acbc
to
4e5eab2
Compare
@@ -58,10 +66,11 @@ public class StrimziManagerTest { | |||
|
|||
@BeforeEach | |||
public void beforeEach() { | |||
// before each test clean Kubernetes server (no Deployments from other runs) | |||
this.server.before(); | |||
try (NamespacedKubernetesClient client = this.server.getClient().inAnyNamespace()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change the test was failing because the StrimziManagerTest
instance and the StrimziManager
instance had clients pointing at different mock kubernetes servers on different ports. The test created a lease in one mock server and the implementation tried to delete it from another which was a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds like we've got really fragile test code. @shawkins are you aware of this problem? Is there a nicer way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to call before. Until we get to a later version of fabric8, you mostly need to clean the state of the mock server yourself in-between tests. Due to a quarkus / okhttp issue we cannot scope the mock server to test classes - it is scoped to the whole module. In fabric8 6.x there is a KuberentesMockServer reset method that should make things easy regardless.
assertNotNull(actual); | ||
List<StrimziVersionStatus> strimziVersions = this.strimziManager.getStrimziVersions(); | ||
assertTrue(checkStrimziVersion(strimziVersions, "strimzi-cluster-operator.v1", true)); | ||
strimziManager.deleteLeadershipLeaseIfPresent(deployment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure how to invoke the ResourceEventHandler which the test passes into a MockResourceInformerFactory
, maybe with some re-working we could keep a reference somewhere in the mock. Maybe a future improvement to make them testable. For now we are calling the delete method directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went and had a shot at this, keeping references to all handlers created per type so that the test can access and call them directly. Maybe it would be better if StrimziManager
implemented ResourceEventHandler<Deployment>
and we didn't us an anonymous inner class, then we could call the methods directly on StrimziManager.
798d011
to
81e41db
Compare
Why: As part of enabling StrimziPodSets in the cluster operator we are also increasing availability of the operator by running 2 replicas and using a k8s lease to elect one of them as a leader, the other will be in standby in case it needs to take over reconciliation. The lease is not able to be automatically cleaned up when we tear down a cluster operator deployment, so we are making the Fleetshard Operator responsible for deleting the lease in reaction to a deployment deletion. Note that the lease is named ${deploymentName}-leadership-token by convention to enable fleetshard to clean it. To get the test working I noticed that the StrimziManager and StrimziManagerTest were communicating with different instances of the mock server on different ports. The StrimziManager is wired with a client before the `@BeforeEach` is invoked. The beforeEach triggered a new mock server to be created and used in the tests.
81e41db
to
35c1271
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Only note is it looks like the delete DSL in the fabric8 client has a different return type in the latest version, so when kas-fleetshard upgrades their fabric8 client the code will likely need a slight change, but for now it looks good to me.
@robobario the approach looks good to me. |
Yes - the boolean return value isn't generally useful and confusing for some of the multi-delete scenarios. Later versions return a list of items marked for deletion and the ability to await all of those deletions to be processed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
The MockResourceInformerFactory keeps references to all handlers added per type so tests can retrieve them and call handler methods directly.
Kudos, SonarCloud Quality Gate passed! |
Why:
As part of enabling StrimziPodSets in the cluster operator we are also increasing availability of the operator by running 2 replicas and using a k8s lease to elect one of them as a leader, the other will be in standby in case it needs to take over reconciliation. The lease is not able to be automatically cleaned up when we tear down a cluster operator deployment, so we are making the Fleetshard Operator responsible for deleting the lease in reaction to a deployment deletion.
This is the upstream issue in the fabric8 client fabric8io/kubernetes-client#4638, once this is resolved we can remove this work from Fleetshard and configure an owner reference in the Strimzi cluster operator
Note that the lease is named ${deploymentName}-leadership-token by convention to enable fleetshard to clean it.
https://issues.redhat.com/browse/MGDSTRM-10325