Skip to content

CustomResource Controllers stop receiving updates after watch reconnect #395

Closed
@secondsun

Description

@secondsun

Sometimes our controllers stop receiving updates about their custom resources. We've seen that sometimes a watch will become disconnected, and the custom resource event source will try to reconnect but fail to do so. We think that this reconnect failure is causing this problem.

I've traced the reconnection and exceptions to here : https://github.com/java-operator-sdk/java-operator-sdk/blob/master/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/internal/CustomResourceEventSource.java#L157 . I believe that the registerWatch method is throwing an exception which isn't caught by the SDK. because the exception is not caught by the SDK the watch stays dead and the event source no longer sends events. See my logs here where I've isolated the failure I'm describing : https://gist.github.com/secondsun/8e31d2680ff689750c62ad6ce9f419c0

To work around this for now I am trying to use reflection and CDI schedulers to get a reference to the customresourceeventsource and call "onClose" with a subclassed watchexception (secondsun/app-services-operator@a25c4bd#diff-7a83aac91ab02c7b354a2f40496c5d38ca1c88d3f72a391e83b8e265eb341454R69).

Clearly this is not an ideal solution, and I'm looking for workarounds or what the best fix at the SDK level could be.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions