-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZeroMQ IPC fails after a while #607
Comments
This seems to be an issue with the API, not zeromq. I can still internally request zeromq however the API fails. I remember it failing after a while before I created the website from time to time, it seems with the large number of additional requests this happens much faster. Only I am not sure why. I will continue investigating. |
I have changed hypercorn to use 8 workers instead of 1 a few days ago and this seems to have helped this issue. The API has been without issue for multiple days now. |
This issue is not resolved sadly. It is definitely a hypercorn issue. Increasing the number of workers only delays when the API starts timing out. I am looking into solutions. |
This now may be resolved. While rewriting this API to rust, I believe I have found the root cause of this issue with the help of @y21. The root cause was that zeromq, for some reason, in its default behaviour, prevents dropping pointers at the end of a function. So when my Turns out this is default zmq behaviour but there thankfully is a method to change this behaviour. So a simple one line fixes this: socket.set_linger(0) That's it. That I what I have tried to find for 8 months. Hopefully this actually fixes it. I will keep this issue open for a bit, if I close it that was it. |
This has been an issue for a while. In a development environment, zeromq works perfectly fine, however not long after a restart of the production code zeromq requests will start failing silently. This makes vote rewards not work as well as all GET endpoints used for the website, rendering it nearly completely useless. Several attempted fixes were implemented but none have worked so far.
This issue occurs in these lines of code:
Sever:
Killua/killua/cogs/api.py
Lines 27 to 57 in 7bf697e
Client:
Killua/killua/webhook/api.py
Lines 30 to 50 in 7bf697e
I suspected this was because of too many open connections but I am not sure if this is the case and I seem to close all connections. This is the output of an lsof command when this issue occurred in production:
Because this has been a longer ongoing issue and because it is quite important for the functionality I am turning this into an issue to keep track on the progress.
I have also asked this stack overflow question in hopes of a fix.
The text was updated successfully, but these errors were encountered: