Description
Sometimes our controllers stop receiving updates about their custom resources. We've seen that sometimes a watch will become disconnected, and the custom resource event source will try to reconnect but fail to do so. We think that this reconnect failure is causing this problem.
I've traced the reconnection and exceptions to here : https://github.com/java-operator-sdk/java-operator-sdk/blob/master/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/internal/CustomResourceEventSource.java#L157 . I believe that the registerWatch
method is throwing an exception which isn't caught by the SDK. because the exception is not caught by the SDK the watch stays dead and the event source no longer sends events. See my logs here where I've isolated the failure I'm describing : https://gist.github.com/secondsun/8e31d2680ff689750c62ad6ce9f419c0
To work around this for now I am trying to use reflection and CDI schedulers to get a reference to the customresourceeventsource and call "onClose" with a subclassed watchexception (secondsun/app-services-operator@a25c4bd#diff-7a83aac91ab02c7b354a2f40496c5d38ca1c88d3f72a391e83b8e265eb341454R69).
Clearly this is not an ideal solution, and I'm looking for workarounds or what the best fix at the SDK level could be.