-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error handling request (no URI read) #3207
Comments
Got any context; when does this get triggered? I partially reverted 1ccebab (meant to silence exceptions below |
Hello, we're experiencing an issue in our production environment, and since we have limited information, diagnosing it is challenging. The problem occurs randomly, and we're using Python version 3.8.18 and Django version 4.2.12. |
|
Hello,
|
Your logs are upside down 🙃 - but yes, you did find something. There are various issues around the In 4023228 all BaseException handling was forced into the parent class, and because exceptions handled from connections that had not yet received a full request line were previously not logged, that greenlet-specific BaseException suppression looks(²) neutralized. I think some class hierarchy respecting way of moving just those two would do the trick, but would prefer a solution that undoes the imho too broad except block in favour of only handling those not-quite-fatal BaseException in our area of responsibility. (² I'll have to write a few tests to tell. gevent exception propagation is somewhat special.) |
Thank you, @pajod, for the detailed explanation. We will attempt to disable max_requests. Have a wonderful week! P.S. Sorry for the logs 🙃 |
Hi @pajod , thank you for looking into this! I have the same issue here. Is there any reason for exposing Thanks! |
FYI, I am describing a related bug. I am not opening a new issue so as not to make too much noise, but I would be interested about comments from the maintainers |
Hi, this might be completely unrelated, but I was seeing a similar error message:
In my case, I was playing around with sendfile/X-Accel-Redirect and had a Django view like this: def login_required_media_view(request, path):
response = HttpResponse(status=200)
response["Content-Type"] = ""
response["X-Accel-Redirect"] = "/media/" + quote(path)
return response
Just in case this is useful for someone here. |
@jeverling It is related - that is how the exception handling looks like without the added gevent complexity. Useful for others to find this issue. Where they can learn that they can revert 0b10cba (to silence the error log again) or 4023228 (to return to the previous exception handling) before a proper bugfix release is issued. (At this time I do not suggest a downgrade.) |
I don't understand this issue. Gunicorn responds appropriately when queried directly with wget/curl, but fails when there's a webserver inbetween. Is that correct? How could such behavior be correct? My nginx configuration is using the default uwsgi_params in the example documentation https://docs.nginx.com/nginx/admin-guide/web-server/app-gateway-uwsgi-django/ |
I can reproduce this issue when I use chrome and after default timeout it through error, but in case of postman or safari I am not seeing it. |
I am facing the same problem except that the error shows up intermittently. I am running
As you can see, we get a timeout and suddenly an exception shows up. When this happens, there is a temporary Internal Server Error due to this but it goes away once another worker is up. |
I have the same problem:
Even with a substantial timeout:
Software versions:
UPDATE: With this setup on NGINX, the timeout stopped occurring:
|
For those in a position to review and/or test it: you are welcome to provide feedback on my suggested patch: #3275 |
Has anyone identified what the actual originating issue is that is causing the worker timeout? I see the PR addresses the exception bubbling, but I've yet to find any threads where anyone has identified what the actual cause of this is. I'm seeing the same issue where a worker will handle the request, the browser receives the data and request ends there, but then the worker dies after the set Setting |
Considering that using raw http or proxy protocol with nginx doesn't generate problems, while uwsgi_pass does, this problem is contained to using wsgi. |
I seem to encounter the same problem after running any Celery-task ( core-1 | [2024-10-24 15:58:04 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:8)
core-1 | [2024-10-24 15:58:04 +0000] [8] [ERROR] Error handling request (no URI read)
core-1 | Traceback (most recent call last):
core-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/workers/sync.py", line 133, in handle
core-1 | req = next(parser)
core-1 | ^^^^^^^^^^^^
core-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/http/parser.py", line 41, in __next__
core-1 | self.mesg = self.mesg_class(self.cfg, self.unreader, self.source_addr, self.req_count)
core-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
core-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/http/message.py", line 259, in __init__
core-1 | super().__init__(cfg, unreader, peer_addr)
core-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/http/message.py", line 60, in __init__
core-1 | unused = self.parse(self.unreader)
core-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^
core-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/http/message.py", line 271, in parse
core-1 | self.get_data(unreader, buf, stop=True)
core-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/http/message.py", line 262, in get_data
core-1 | data = unreader.read()
core-1 | ^^^^^^^^^^^^^^^
core-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/http/unreader.py", line 36, in read
core-1 | d = self.chunk()
core-1 | ^^^^^^^^^^^^
core-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/http/unreader.py", line 63, in chunk
core-1 | return self.sock.recv(self.mxchunk)
core-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
core-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/workers/base.py", line 204, in handle_abort
core-1 | sys.exit(1)
core-1 | SystemExit: 1
core-1 | [2024-10-24 15:58:04 +0000] [8] [INFO] Worker exiting (pid: 8)
core-1 | [2024-10-24 15:58:05 +0000] [17] [INFO] Booting worker with pid: 17
On
|
I also faced the problem but in my case, I could find what was going on. Very simple (after 1 day to find out why): My worker was trying to make an HTTPS request to another web server (Gunicorn by the way, but not really important) that was listening in HTTP (without TLS). It was really difficult to "see" that as I was debugging a running set of services without a comfortable way to see every environment variables I passed to each one. I hope I could help someone. |
I just tried #3275 and I am still getting these errors in sentry. |
@ncpalmie I think I did, at least for my specific case (homelab). My frontend is nginx (nginx version: nginx/1.26.2) running in a FreeBSD jail and from there I reverse-proxy to another FreeBSD jail running Gunicorn (gunicorn==23.0.0), serving Flask. The proxy connection is TLS1.3 only - the nginx-side has a client cert and the gunicorn-side has a server-cert, and you would get SSL_ERROR_RX_CERTIFICATE_REQUIRED_ALERT from Firefox when trying to access the Gunicorn side directly as only the client cert (Nginx) and the server cert (Gunicorn) together can establish the connection. The whole CA infrastructure is setup using openssl and I documented every single step when I created my test-CA, the certs, the keys, nginx, gunicorn, etc. My error logs looked exactly like the ones of @DvaMishkiLapa @lordsarcastic and @nonprofitnerd when I did the following: After I managed to get the reverse-proxy connection TLS1.3-encrypted I tried to find out how SNI works, as mentioned in the To avoid the error everyone had in their logs I believe the following must be set up correctly: (all this assumes that your firewall is set up correctly, otherwise openssl gives you connect() permission denied errors)
When I tried to use This was when I got the error everyone had in my logs (restoring Note that I only configured my original Gunicorn server in Also note that Gunicorn did not crash until the arrival of a TLS1.3 request. I could reproduce the error via Nginx and also using either of the following
and
The internet said that the first one is good for checking the SANs in the cert, the Using either of these commands on the Nginx side made Gunicorn crash and produced the error everyone had in the logs, just like Nginx did before. Not sure if it matters, but I used EC instead of RSA for all my certs and used the P-256 curve in my openssl commands. I'm fairly new to all this. I was a little suprised that both, Nginx and Gunicorn, seem to behave so erratic when a TLS error of this kind occurs. I don't know what kind of parsing and the like is involved behind the scenes though, this is probably very complicated. Anyway, I hope someone will benefit from my observations. |
I'm observing the same issue locally and on a production server. This can be replicated with the most basic Flask, Django or simple 'hello world' app.py. Bug: Chrome on ARM processors stalls booting of Gunicorn worker after timeouts. Chrome user may experience a 500 Internal Server Error. Envrionment:
Minimal reproduction code app.py:
Steps to reproduce:
Key Observations:
With prior versions of Gunicorn, before version 22, the workers booted instantly after a timeout. Now it seems that under specific circumstances, using a Chrome browser on ARM, this can cause the booting of the new worker to stall without log entries for up to a minute. Hoping this helps narrow down the issue for the developers. |
To follow up a bit. After just rebooting my laptop the issue with Chrome now does not appear to happen. Chromium on the Jetson Nano still had the issue until that was rebooted. So now everything I was sure about earlier is again a question. What a difficult problem to troubleshoot. |
I'm also experiencing this problem after upgrading Gunicorn from v20.1.0 to v22.0.0. Steps that lead to these exceptions:
What's suspicious is that the tracebacks for requests that didn't manage to finish are always the same and never point to my application code. Instead, they always seem to be stuck at Full traceback
Reading a request from a socket should be very quick, orders of magnitude faster than the graceful timeout of 30 seconds. Just guessing: Is it perhaps a bug in some requests still being accepted after the server is not supposed to accept any more? Or maybe the code for reading from a socket and filling in the buffer can get stuck in an infinite loop under some conditions? Edit: Ah, I see this is already being worked on in #3275 |
After upgrading from version 20.1.0 to 22.0.0, we have encountered "Error handling request (no URI read)" errors.
/var/log/messages
gunicorn.conf.py
Env:
The text was updated successfully, but these errors were encountered: