-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
irrd-whois-worker process doesn't timeout long running open connections / queries #693
Comments
Summary: I think this issue is a variant of #607 - where that looked like just a cleanup at the time, it seems this can actually cause the issue you have been seeing. If this is the same issue, you should be seeing a fair number of connections in FIN-WAIT-2 that stay stuck there - and a similar strace. If that's right, 0b3c7d1 already fixes the problem, and I can release a new 4.2 soon. Small strace from all workers and the handler (15549) after it got stuck, when I made a new connection:
So the handler is still picking up connections, dropping them off on the queue, but no workers will pick them up - they are all blocked on receiving data from a socket. For worker 15552, this is the blocked socket:
A stack dump reveals:
The readline() call does not appear to return even after the 30 second timer has expired and the connection has been forcibly closed from our end. The only way to break out of that is setting a timeout, as we now do in 0b3c7d1. |
with no movement at all. If there's further data we can provide, let us know. |
Can you see what |
The PID assigned to the stuck worker is nowhere to be found in |
That is very peculiar, but as your case definitely stays stuck on the same recvfrom, I do think setting the socket timeout is likely to resolve this. I'll include this in a 4.2.6 release, probably released in the next few days. |
4.2.6 is releases: https://github.com/irrdnet/irrd/releases/tag/v4.2.6 |
Describe the bug
If a connection is opened to the IRRd service with no source data ( query ) passed for an period of time the irrd-whois-worker process is put into a 'hung' state and will no longer be available to process any queries passed to it from the irrd-whois-server-listener process.
Looking at these process PIDs that are in a "hung" state, via strace they are still making some basic system calls they never appear to process any data. When all of the irrd-whois-worker processes get put into this state the system is no longer able to serve any responses. Once all the irrd-whois-worker processes are 'hung' the IRRd client will can no longer process any request and the service becomes unavailable.
We are seeing traffic from clients that are making a connection ( client unknown ) but sending an empty or no query. These interactions are not logged via IRRd and cause the irrd-whois-worker process to be put into a 'hung' state
In the process table irrd-whois-worker processes have a client address and port appended to them when a connection is opened and remain in this state when becoming 'hung'
To Reproduce
The
nc
ortelnet
utilities can be used to open a connection to an IRRd instance over port 43 but asking no question. Letting this process sit for several minutes will put the irrd-whois-worker process into the 'hung' state. This can be observed even after the client closes the connection to the IRRd instance after an undetermined period of time.The exact timing for how long a connection needs to be open to cause this issues is yet undetermined but it is longer than the 30s timeout configured within the application.
Expected behaviour
The IRRD irrd-whois-worker process would timeout connections and become available to process queries
IRRd version you are running
Version 4.2.5, but the behavior was also observed prior to an upgrade on version 4.1.8
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: