Waiting socket reads block worker restart #3227

michdan · 2024-06-13T09:00:39Z

I have noticed in the logs that there is a recurring GreenletExit exception. Although this does not cause an error for clients, it negatively impacts performance.

Steps to reproduce:

Server configuration:

Gunicorn version: 22.0.0
Gevent version: 24.2.1
Python version: 3.11.9

Gunicorn settings:

max_requests = 2 (set low to demonstrate the problem quickly)
timeout = 60
worker_class = 'gevent'
workers = 1 (to highlight the problem more clearly)
graceful_timeout = 30 (default)
keepalive = 75

Note: The key point is that keepalive > graceful_timeout.

Client:

Although the server has keepalive set to 75, the client does not reuse connections and opens a new one each time.
The client sends requests in a loop, in parallel.

Observations:
When both the client and server are running, the server occasionally restarts the worker. During this restart, it stops accepting new connections and waits for the graceful period to pass, causing the client to wait 30 seconds before continuing processing. After the graceful period, a message appears in the logs:


Traceback (most recent call last):

File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/base_async.py", line 48, in handle

req = next(parser)

^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/gunicorn/http/parser.py", line 42, in __next__

self.mesg = self.mesg_class(self.cfg, self.unreader, self.source_addr, self.req_count)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/gunicorn/http/message.py", line 257, in __init__

super().__init__(cfg, unreader, peer_addr)

File "/usr/local/lib/python3.11/site-packages/gunicorn/http/message.py", line 60, in __init__

unused = self.parse(self.unreader)

^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/gunicorn/http/message.py", line 269, in parse

self.get_data(unreader, buf, stop=True)

File "/usr/local/lib/python3.11/site-packages/gunicorn/http/message.py", line 260, in get_data

data = unreader.read()

^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/gunicorn/http/unreader.py", line 37, in read

d = self.chunk()

^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/gunicorn/http/unreader.py", line 64, in chunk

return self.sock.recv(self.mxchunk)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/site-packages/gevent/_socketcommon.py", line 666, in recv

self._wait(self._read_event)

File "src/gevent/_hub_primitives.py", line 317, in gevent._gevent_c_hub_primitives.wait_on_socket

File "src/gevent/_hub_primitives.py", line 322, in gevent._gevent_c_hub_primitives.wait_on_socket

File "src/gevent/_hub_primitives.py", line 304, in gevent._gevent_c_hub_primitives._primitive_wait

File "src/gevent/_hub_primitives.py", line 46, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait

File "src/gevent/_hub_primitives.py", line 46, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait

File "src/gevent/_hub_primitives.py", line 55, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait

File "src/gevent/_waiter.py", line 154, in gevent._gevent_c_waiter.Waiter.get

File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch

File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch

File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch

File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch

greenlet.GreenletExit

[2024-05-28 13:54:12 +0000] [17990] [INFO] Worker exiting (pid: 17990)

After this, a new worker is started and everything works again until the next restart.

Is this expected behavior? In my opinion, no. It doesn't make sense to wait for sock.recv if we want to restart the process and there is nothing to read from the socket. Once the restart begins, the socket should be shut down, but it still waits for read operations. This happens because the greenlets that started before the restart are inside the keepalive loop and are waiting. The server is stopped (see this line in ggevent.py), but the pending socket waiters are not shut down.

I believe there have been attempts to fix this, such as in commit 7896057 (reverted here: 8c5613b). I also proposed a fix that seemed to work: shutting down read events in the socket if worker.alive == False. However, after testing, it did not work as expected in all cases.

After further consideration, I am proposing a new change that seems to be low-risk but can significantly reduce disruptions in request handling.
MR with my proposal: #3236

The text was updated successfully, but these errors were encountered:

…tarted

michdan mentioned this issue Jun 13, 2024

3227: shutdown socket when worker is not alive #3228

Closed

michdan mentioned this issue Jul 4, 2024

#3227: Wait for the keepalive period to end before stopping the acceptance of new requests #3236

Open

michdan pushed a commit to michdan/gunicorn_keepalive_fix that referenced this issue Jul 12, 2024

benoitc#3227: Use a monotonic clock to measure elapsed time.

bcdbc09

michdan pushed a commit to michdan/gunicorn_keepalive_fix that referenced this issue Oct 15, 2024

benoitc#3227: do not allow keepalive if the worker is about to be res…

fb62e1e

…tarted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Waiting socket reads block worker restart #3227

Waiting socket reads block worker restart #3227

michdan commented Jun 13, 2024 •

edited

Loading

Waiting socket reads block worker restart #3227

Waiting socket reads block worker restart #3227

Comments

michdan commented Jun 13, 2024 • edited Loading

michdan commented Jun 13, 2024 •

edited

Loading