-
-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handler starts but never finishes #810
Comments
Can you please show how the code looks in general (maybe with no details if they are secret)? Also, can you add debug logs into the handler and see if the handler is entered and exited. I would assume the issue is that some API operations inside the handler are broken. |
yes you are right, I believe we are hitting this issue Azure/AKS#1052 so closing. |
@euan-tilley Thanks for letting me know! As a side note, keep in mind that there is a very similar issue known to happen with Kopf and any Kubernetes deployment (not only Azure), any version of Kopf & Kubernetes, with any client library (there was the official client, then pykube-ng, now aiohttp): the watch-streams sometimes silently go into the idle mode after several minutes of inactivity, yield no events even when the changes happen, but do not report the TCP disconnection to the client (Kopf in this case) — so the client (Kopf) believes the connection is alive, just nothing happens out there. The root cause is unknown. To work around this issue, the This does not affect the connection pools inside the handlers as in this reported issue. Kopf uses its own However, if your environment is prone to this kind of connection issues, it is worth setting these timeouts too. |
Long story short
I have recently deploy our operator to an AKS cluster (its been running on EKS & on-prem clusters without any issue) & started noticing that it wasn't
handling changes to CRDs, restarting the container would trigger the handler successfully. At first I thought that we were missing events so explicitly added some api timeouts at startup.
However this didn't make a difference & events still seemed to be being missed. On further investigation (running with --debug startup param) the event did seem to get picked up & the handler triggered but it just seems to hang without ever finishing.
There would be no
Handler 'config_update_handler' succeeded.
log line (I left it for over 1 hour). It seems it would stay in this state forever.When restarting the container there are some log lines that take about
Unprocessed streams
but I haven't been able to figure out why.Not sure wether this is related to #718 as it does seem similar. One thing to note is that the cluster is not running at scale there are less that 5 resources being watched.
Kopf version
1.29.0
Kubernetes version
1.19.11
Python version
3.8
Code
No response
Logs
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: