-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
100% cpu in mainthread due to not closing properly? (channel.connected == False) #418
Comments
Waitress does seem to properly close if shutdown is received (empty data). So have to keep looking for a way connected can be false but it can still be trying to write. |
Dylan Jay wrote at 2023-9-11 07:56 -0700:
@d-maurer
> The error is where `self.connected` was set tu `False`.
> There, it should have been ensured that the corresponding "fileno"
> is removed von `socket_map` and that it will not be put there again
> (as long as `self.connected` remains `False`).
>
> Something exceptional must have brought `waitress` into this state
> (otherwise, we would have lots of 100 % CPU usage reports).
> I assume that some bad client has used the system call `shutdown`
> to close only part of the socket connection and that `waitress`does not
> anticipate something like that.
Waitress does seem to properly close if shutdown is received (empty data).
see https://github.com/Pylons/waitress/blob/main/src/waitress/wasyncore.py#L449
I would expect an empty data read, if the sending part of the connection
is shut down. However, the client might shut down the receiving part;
for me, this does not suggest an empty data read.
|
kDylan Jay wrote at 2023-9-11 07:56 -0700:
@d-maurer
> The error is where `self.connected` was set tu `False`.
> There, it should have been ensured that the corresponding "fileno"
> is removed von `socket_map` and that it will not be put there again
> (as long as `self.connected` remains `False`).
>
> Something exceptional must have brought `waitress` into this state
> (otherwise, we would have lots of 100 % CPU usage reports).
> I assume that some bad client has used the system call `shutdown`
> to close only part of the socket connection and that `waitress`does not
> anticipate something like that.
Waitress does seem to properly close if shutdown is received (empty data).
see https://github.com/Pylons/waitress/blob/main/src/waitress/wasyncore.py#L449
I have an idea how to fix the inconsistency without understanding how
it came into being:
The `handle_write_event` is (likely) only called by the loop.
Thus, if it is called and the connection is closed, it can detect
an inconsistency (a closed connection should not get events).
It can resolve the inconsistency by deregistration
with the loop.
|
@d-maurer yes it could work to insert a self.del_channel(self) in waitress/src/waitress/channel.py Line 97 in 84360df |
@d-mauer I created a PR and it passes the current tests and they hit that line but hard to know how to make a test for this scenario... |
Dylan Jay wrote at 2023-9-11 10:10 -0700:
@d-mauer I created a PR and it passes the current tests and they hit that line but hard to know how to make a test for this scenario...
#419
Some test should not be that difficult: you register a `write` handler
for a connection and then set `connected` to `False`.
In the next loop run, there will be a write event and you can check
that it does not end in an infinite loop.
Of course, the test only verifies that a `not connected` does not
lead to an infinite loop. It does not try to set up
a realistic case for setting `connected` to `False`.
|
@mcdonc another solution instead of #419 might be below. is that preferable?
perhaps you have a better idea on how it could have got into this knot and the best way to test? |
@mcdonc one code path that could perhaps lead to this is waitress/src/waitress/wasyncore.py Line 313 in 4f6789b
since connecting == False also there doesn't seem to be a way for it to write data out or close? EDIT: one scenario could be the client half disconnected very quickly before the dispatcher was setup so getpeername fails? but somehow the socket still can be written to? |
Looks like it is possible a connection thats been broken before getpeername to then no have any error in select. in the case where there is nothing to read since that will result in a close. https://stackoverflow.com/questions/13257047/python-select-does-not-catch-broken-socket. not sure how it has something write in that case. maybe shutdown for readonly very quickly? EDIT: https://man7.org/linux/man-pages/man3/getpeername.3p.html "EINVAL The socket has been shut down." <- so looks like shutdown for read very quickly seems possible to create this tight loop. |
or somethow the getpeername is invalid and that results in a oserror. and there is nothing to read but something to write. but I'm not sure if that results in the EINVAL or not. |
Dylan Jay wrote at 2023-9-11 18:35 -0700:
@mcdonc another solution instead of #419 might be below. is that preferable?
```
def poll(timeout=0.0, map=None):
if map is None: # pragma: no cover
map = socket_map
if map:
r = []
w = []
e = []
for fd, obj in list(map.items()): # list() call FBO py3
# prevent getting into a loop for sockets disconnected but not properly closed.
if obj.check_client_disconnected():
obj.del_channel()
continue
```
Not sure. It, too, uses `del_channel` suggesting "delete channel".
While "delete channel" likely removes the channel from the socket map,
it likely does more than that -- and maybe things not best
placed into `poll`.
Another point: the `map` may contain objects without `del_channel`
(e.g. non channels) and `check_client_disconnected`.
In those cases, the code above would raise an exception and bring
your service down.
Currently, I favor the following reasoning:
You have detected that a write event for a "non connected" connection
leads to a busy loop. We must do something about it (either prevent
the case from happening or clean up locally).
For the local cleanup, I proposed to unregister the "write event handler".
You have translated this into a call to `del_channel`,
likely because the loop does not support "write event handler"s.
The loop API only supports the registration of objects (with `fileno`,
`readable` and `writable`, ... methods); those objects' `writable`
method indicates whether the object is interested in write event notifications.
I propose to implement "unregister the write event handler" not
by a `del_channel` call but by a modification of `writable`: ensure it
returns false for a non connected connection.
In the long run `waitress` should likely change its "connected" concept.
HTTP is based on TCP which implements bidirectional communication channels.
The `shutdown` system call allows applications to shut down individual
directions. This is not compatible with a boolean "connected",
instead we have 4 connection states: (fully) disconnected,
read connected, write connected and (fully) connected.
I do not know whether important HTTP clients exist which use `shutdown`.
Some might want to signal "no more input" by shutting down the
outgoing communication channel, e.g. for a long download.
perhaps you have a better idea on how it could have got into this knot
My primary thought:
It must be something exceptional which caused the situation. Otherwise,
lots of 100 % CPU usage reports should have been seen.
Initially, I thought of a use of `shutdown` to (only) partially
shut down the socket. Likely, such a use is unanticipated and may
reveal bugs.
Meanwhile, I can think of other causes:
Maybe, when the connection is closed while the response is
beeing streamed back, an inconsistency can creep in.
The "async" world should be quite robust against such types of problems
(because all IO logic is serialized) but the system is connected
to a network and important network events (e.g. the closing of a
communication channel) can happen at any time and cause unanticipated
exceptions changing the normal control flow.
and the best way to test?
We have here the case that we cannot reproduce the real situation
(because we do not know how the inconsistency was introduced).
We therefore only implement a workaround.
My favorite workaround (ensure "writable" returns "False" when "not connected")
is so simple that no test is necessary to convince us that
the busy loop is avoided.
The workaround may not solve all problems. For example, it may
keep a socket in use which should in principle have been closed.
|
Dylan Jay wrote at 2023-9-11 21:59 -0700:
or somethow the getpeername is invalid and that results in a oserror. and there is nothing to read but something to write. but I'm not sure if that results in the EINVAL or not.
I do not think that this goes into the right direction:
`waitress` is an HTTP server and (unlike e.g. a `telnet` server) it produces
output only after it has received input.
Thus, I expect that the problematic connection has once been in
a `connected` state (to read something and then to produce output).
|
@d-maurer I'm fairly sure I have one solid explanation how this could occur. Outlined in this test - https://github.com/Pylons/waitress/pull/419/files#diff-5938662f28fcbb376792258701d0b6c21ec8a1232dada6ad2ca0ea97d4043d96R775 NOTE: I haven't worked out a way to detect the looping in a test yet. So the assert at the end is not correct. It is as you say. There is a shutdown of the read only but this is a race condition. it has to happen before the dispatcher is created so right after the connect. I've confirmed this results in an getpeername returning OSError EINVAL and thus connected = False and the select still thinks it can write so the loop will be inifinite. or maybe until the bad actor properly closes the connection. not sure on that one.
true. but if I'm right on the cause of this this, the socket would never have connected=False with most shutdowns. Only when it happens too quickly. That flag is mostly used to indicate not yet connected or in the process of closing.
yes that will also work. I'll switch it to that. |
@d-maurer I pushed new code that uses writable instead. |
Dylan Jay wrote at 2023-9-12 03:59 -0700:
...
It is as you say. There is a shutdown of the read only but this is a race condition. it has to happen before the dispatcher is created so right after the connect. I've confirmed this results in an getpeername returning OSError EINVAL and thus connected = False and the select still thinks it can write so the loop will be inifinite.
I do not know `waitress` implementation details **BUT**
in general, write notications are called for only **AFTER**
output has been generated (i.e. `writable` will only
return `True` once data to be written has been generated).
As explained earlier, an HTTP server first reads data from
a connection before it writes to the connection.
If you are right with your assumption above, then
reading has been possible (despite a "not connected")
and output was generated based in this input.
or maybe until the bad actor properly closes the connection. not sure on that one.
The connection's `writeable` must also return `True`
(otherwise, the corresponding fd will not be included in `writefs`).
Usually, this would happen if it is known that there is data to be output.
|
@d-maurer maybe a core contributor can step in and advise the best solution and test. @digitalresistor @kgaughan ? |
Dylan Jay wrote at 2023-9-12 20:02 -0700:
@d-maurer maybe a core contributor can step in and advise the best solution and test. @digitalresistor @kgaughan ?
I had a closer look at the code and I think I found a realistic
scenario to enter the busy loop state:
If `HTTPChannel.read` reads empty data, it sets `connected` to `False`;
if there is pending output at that time we are in the busy loop state.
We can force `HTTPChannel.read` to read empty data by letting
the HTTP client shutdown its sending direction. Once all data
has been read by the receiving site, its next `recv` will return empty data.
A normal `close` (rather than `shutdown`) might have a similar effect.
The hypothesis can be checked in the following way:
Design an HTTP service to produce sufficient output to saturate the
output channel.
Design an HTTP client: it sends a request to the service
(but does not read the response),
waits sufficiently long such that the service has produced its output,
then shuts down the writing direction of its HTTP connection
(maybe just closes its HTTP connection).
Check whether this has brings `waitress` into the busy loop state.
|
@d-maurer that was my initial thought but as I pointed out in #418 (comment) recv in wasynccore will do def recv(self, buffer_size):
try:
data = self.socket.recv(buffer_size)
if not data:
# a closed connection is indicated by signaling
# a read condition, and having recv() return 0.
self.handle_close()
return b""
else:
return data
except OSError as why:
# winsock sometimes raises ENOTCONN
if why.args[0] in _DISCONNECTED:
self.handle_close()
return b""
else:
raise |
Also when I did some testing it did seem like the select would indicate a write was possible even without the back end producing any data. So there is no read needed. Just a connect and very quick shutdown. But I do have to work out a proper test for that. |
Dylan Jay wrote at 2023-9-13 00:07 -0700:
@d-maurer that was my initial thought but as I pointed out in #418 (comment) recv in wasynccore will do ```handle_close``` on getting empty data and take it out of the map as I couldn't see any way for no bytes being sent to cause this loop.
You are right!
I missed (had forgotten) that.
|
Dylan Jay wrote at 2023-9-13 00:11 -0700:
Also when I did some testing it did seem like the select would indicate a write was possible even without the back end producing any data.
A `select` will almost always report a possible `write`.
(For a "normal" socket) the only exception is that the write buffer is
satuarated.
That's why the `writeable` must return `False` unless there
is data to write (or the `handle_write` will be able to clean up the state).
So there is no read needed. Just a connect and very quick shutdown. But I do have to work out a proper test for that.
Only, if `waitress` defines its `writable` in a non standard way:
typically, `writable` would only return `True` if output was pending.
In `channel.py` `writable` is defined as:
```
return self.total_outbufs_len or self.will_close or self.close_when_flushed
```
Thus, it is not completely standard.
However, as far as I could see, `will_close` and `close_when_flushed`
can only be set during request processing, i.e. after input has been received.
|
Dylan Jay wrote at 2023-9-13 00:07 -0700:
@d-maurer that was my initial thought but as I pointed out in #418 (comment) recv in wasynccore will do ```handle_close``` on getting empty data and take it out of the map as I couldn't see any way for no bytes being sent to cause this loop.
I have meanwhile read the Python socket HOWTO
(--> "https://docs.python.org/3/howto/sockets.html#socket-howto").
It recommends (in the "Disconnecting" section) to operate
in an HTTP-like exchange: send the request and then use
`shutdown(1)` to indicate "I (the client) will produce no more output
but am still ready for input".
The behavior of `waitress` you point out above
(close as soon as there is no more input) will not play well
with this recommendation.
|
@d-maurer that would be a different bug in waitress. My problem is I run out of CPU on my servers if I don't restart them often due to these weird requests we are receiving. That no one else is the world seems to get :( |
Dylan Jay wrote at 2023-9-13 04:26 -0700:
...
My problem is I run out of CPU on my servers if I don't restart them often due to these weird requests we are receiving. That no one else is the world seems to get :(
Would you share the version of `waitress` you are observing the behavior?
|
Dieter Maurer wrote at 2023-9-13 09:57 +0200:
Dylan Jay wrote at 2023-9-13 00:11 -0700:
>Also when I did some testing it did seem like the select would indicate a write was possible even without the back end producing any data.
A `select` will almost always report a possible `write`.
(For a "normal" socket) the only exception is that the write buffer is
satuarated.
That's why the `writeable` must return `False` unless there
is data to write (or the `handle_write` will be able to clean up the state).
>So there is no read needed. Just a connect and very quick shutdown. But I do have to work out a proper test for that.
Only, if `waitress` defines its `writable` in a non standard way:
typically, `writable` would only return `True` if output was pending.
In `channel.py` `writable` is defined as:
```
return self.total_outbufs_len or self.will_close or self.close_when_flushed
```
Thus, it is not completely standard.
However, as far as I could see, `will_close` and `close_when_flushed`
can only be set during request processing, i.e. after input has been received.
`will_close` can be set by `server.BaseWSGIServer.maintenance`, i.e.
independent of a task/request.
Thus, you might be right with your hypothesis:
connected set to false in `wasyncore.dispatcher.__init__`
due to a connection race condition;
later busy loop due to `not connected and writable`.
You could verify this as follows:
Add a sufficiently large `sleep` into `wasyncore.dispatcher.__init__`
before the `getpeername` call.
Open a connection to the server and immediately close it again.
The `sleep` should ensure that at the time of the `getpeername` call,
the remote socket end is already closed (maybe, `getpeername` then
fails with an exception and `connected` is set to `False`).
Wait sufficiently long (somewhat longer than `cleanup_interval`) to
let `maintenance` set `will_close`.
If you are right, this will result in a busy loop.
|
Dieter Maurer wrote at 2023-9-13 14:26 +0200:
...
`will_close` can be set by `server.BaseWSGIServer.maintenance`, i.e.
independent of a task/request.
Thus, you might be right with your hypothesis:
connected set to false in `wasyncore.dispatcher.__init__`
due to a connection race condition;
later busy loop due to `not connected and writable`.
You could verify this as follows:
Add a sufficiently large `sleep` into `wasyncore.dispatcher.__init__`
before the `getpeername` call.
Open a connection to the server and immediately close it again.
The `sleep` should ensure that at the time of the `getpeername` call,
the remote socket end is already closed (maybe, `getpeername` then
fails with an exception and `connected` is set to `False`).
Wait sufficiently long (somewhat longer than `cleanup_interval`) to
let `maintenance` set `will_close`.
If you are right, this will result in a busy loop.
On my system (Linux, kernel 5.4.0), `getpeername` returns
the peer address even after the socket's remote end has been closed.
Verified via the following interactive code:
server code:
```python
from socket import socket, AF_INET, SOCK_STREAM
ss = socket(AF_INET, SOCK_STREAM)
ss.bind(("localhost", 10000))
ss.listen()
cs, addr = ss.accept()
# run the client code in a second interactive session
cs.getpeername()
```
client code:
```python
from socket import socket, AF_INET, SOCK_STREAM
cs = socket(AF_INET, SOCK_STREAM)
cs.connect(("localhost", 10000))
cs.close()
```
|
Note that a socket has 2 ends. "The socket has been shut down" might refer to the local (not the remote) end. |
No, That is why waitress has a |
@d-maurer maybe but it will result in the socket being closed since it will allow it to read the RST. Thats also how it works in the test. I'll try it in production. But this is still a bug. I've put in two fixes.
|
@d-maurer I've also worked out why maintenance never close the connection and put in a fix for that too. |
Well this is at least the commit that put this in |
Dylan Jay wrote at 2023-9-18 08:49 -0700:
@d-maurer maybe but it will result in the socket being closed since it will allow it to read the RST. Thats also how it works in the test. I'll try it in production.
But this is still a bug. I've put in two fixes.
```
def handle_write(self):
# Precondition: there's data in the out buffer to be sent, or
# there's a pending will_close request
if not self.connected and not self.close_when_flushed:
# we dont want to close the channel twice
return
```
Maybe, this change is not yet optimal:
The comment indicates that the purpose is to avoid to close the channel
twice.
It likely addresses the following scenario:
the previous `handle_read` has detected a remote closure and
closed the channel; the following `handle_write` may try to
close the channel again which should be prevented.
My proposal: do not abuse `connected` for this logic.
Instead introduce an additional attribute `closed`, initially `False`
and set to `True` in `_dispatcher.close`.
This way, `closed` is `True` if and only if `close` has already been called.
The code above could become `if self.closed: return`.
|
Dylan Jay wrote at 2023-9-18 08:49 -0700:
@d-maurer maybe but it will result in the socket being closed since it will allow it to read the RST. Thats also how it works in the test. I'll try it in production.
That's where my speed argument may come in:
when the state change from `channel.service()` (which likely requires
a thread switch) happens after the next poll, `channel.readable()`
still returns `True` in the `poll` and the channel has the chance to learn that
the remote end was closed (closing the channel);;
only when it happens before the next poll, it prevents `readable`
from returning `True` and the remote end closing remains unnoticed.
|
Dylan Jay wrote at 2023-9-19 21:16 -0700:
@d-maurer I've also worked out why maintenance never close the connection and put in a fix for that too.
Unfortunately there doesn't seem to be a test for this close called twice case so still not clear why that line that caused the loop is there in the first place and if isn't better just to get rid of it.
Scenario:
The channel is both `readable` and `writable`
(not sure whether this is possible);
When in this state, `recv` returns empty data, it will close the channel;
a following `handle_write` (in the same `poll` call)
will try to close a second time (because the `send` will (likely)
fail with an exception).
|
The current PR has been running in production of a couple of weeks without the bug reoccuring so that at least confirms the source of the issue. |
@djay I have not been able to reproduce the original error with the current released version of waitress. It is somewhat frustrating that I can't reproduce it. |
@digitalresistor I merged in master, reversed my fixes and uncommented the test case code that reproduces it and it showed that it went into a loop still. How did you try and reproduce it? |
@djay I have not been using the test code you created. I wanted to reproduce it without trying to set up the conditions to match exactly, or forcing a socket into a particular state. I have been testing with waitress running on a RHEL 8.9 system, I have been unable to get I also would expect to see far more instances of this error happening out in the wild, waitress is used extensively, and I have yet to see it happen on any of my applications that are running in production. I don't disagree that there is an issue and we can solve it, but I find it interesting that it is not easily reproducible on test systems I have. The current PR does the closing much later, I'd much rather see it get handled when Then we can rework the logic for writeable as well to make sure that we can't busy loop there. |
Delta Regeer wrote at 2024-2-4 21:02 -0800:
@djay I have not been using the test code you created. I wanted to reproduce it without trying to set up the conditions to match exactly, or forcing a socket into a particular state. I have been testing with waitress running on a RHEL 8.9 system, I have been unable to get `getpeername()` to fail in those conditions even while trying to do the half-shutdown and stress testing waitress at the same time. I have not manually manipulated SO_LINGER to setup the conditions as specified, just using the defaults from the OS.
"Normal" browsers (and other HTTP agents) likely use `SO_LINGER` rarely.
I assume that this is the reason why we see only few cases.
|
@digitalresistor The circumstances are very particular. It was something we were getting from pentests only I believe.
This part ensures its closed if getpeername fails And I added some other tests to show you how the other fix prevents any looping if for any other reasons channel.connected would become false and we have an error in the request. Even though that should not happen. |
@djay I will take another look at this over the next week or so. I appreciate the follow-up! |
@digitalresistor I wouldn't think about it too long. basically I've given a recipe to DDOS any waitress site. lock up one thread with a single request. This would happen every sunday with this repeated pentest. Once I deployed the fix it never happened again. Leaving a bug like this is in doesn't seem right to me. |
No longer call getpeername() on the remote socket either, as it is not necessary for any of the places where waitress requires that self.addr in a subclass of the dispatcher needs it. This removes a race condition when setting up a HTTPChannel where we accepted the socket, and know the remote address, yet call getpeername() again which would have the unintended side effect of potentially setting self.connected to False because the remote has already shut down part of the socket. This issue was uncovered in #418, where the server would go into a hard loop because self.connected was used in various parts of the code base.
I've created a new PR that removes Could you drop this into your system instead and take a look if you still see the same issue: #435 |
@digitalresistor I can. |
@digitalresistor sorry I see you remove this also. Ok, I will test this |
@djay do you have any feedback with the newly proposed changes? |
@digitalresistor sorry for the delay on this. It was deployed earlier this week but the weekly pentest only happens once a week on a sunday so monday we should know. |
@digitalresistor seems to be running fine in prod with what we presume are the same pentests being thrown at it |
Bumps [waitress](https://github.com/Pylons/waitress) from 2.1.2 to 3.0.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/Pylons/waitress/releases">waitress's releases</a>.</em></p> <blockquote> <h2>v3.0.0</h2> <h1>3.0.0 (2024-02-04)</h1> <ul> <li> <p>Rename "master" git branch to "main"</p> </li> <li> <p>Fix a bug that would appear on macOS whereby if we accept() a socket that is already gone, setting socket options would fail and take down the server. See <a href="https://redirect.github.com/Pylons/waitress/pull/399">Pylons/waitress#399</a></p> </li> <li> <p>Fixed testing of vendored asyncore code to not rely on particular naming for errno's. See <a href="https://redirect.github.com/Pylons/waitress/pull/397">Pylons/waitress#397</a></p> </li> <li> <p>HTTP Request methods and versions are now validated to meet the HTTP standards thereby dropping invalid requests on the floor. See <a href="https://redirect.github.com/Pylons/waitress/pull/423">Pylons/waitress#423</a></p> </li> <li> <p>No longer close the connection when sending a HEAD request response. See <a href="https://redirect.github.com/Pylons/waitress/pull/428">Pylons/waitress#428</a></p> </li> <li> <p>Always attempt to send the Connection: close response header when we are going to close the connection to let the remote know in more instances. <a href="https://redirect.github.com/Pylons/waitress/pull/429">Pylons/waitress#429</a></p> </li> <li> <p>Python 3.7 is no longer supported. Add support for Python 3.11, 3.12 and PyPy 3.9, 3.10. See <a href="https://redirect.github.com/Pylons/waitress/pull/412">Pylons/waitress#412</a></p> </li> <li> <p>Document that trusted_proxy may be set to a wildcard value to trust all proxies. See <a href="https://redirect.github.com/Pylons/waitress/pull/431">Pylons/waitress#431</a></p> </li> </ul> <h2>Updated Defaults</h2> <ul> <li>clear_untrusted_proxy_headers is set to True by default. See <a href="https://redirect.github.com/Pylons/waitress/pull/370">Pylons/waitress#370</a></li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/Pylons/waitress/blob/main/CHANGES.txt">waitress's changelog</a>.</em></p> <blockquote> <h2>3.0.1 (2024-11-28)</h2> <p>Security</p> <pre><code> - Fix a bug that would lead to Waitress busy looping on select() on a half-open socket due to a race condition that existed when creating a new HTTPChannel. See Pylons/waitress#435, Pylons/waitress#418 and GHSA-3f84-rpwh-47g6 <p>With thanks to Dylan Jay and Dieter Maurer for their extensive debugging and<br /> helping track this down.</p> <ul> <li> <p>No longer strip the header values before passing them to the WSGI environ.<br /> See <a href="https://redirect.github.com/Pylons/waitress/pull/434">Pylons/waitress#434</a> and<br /> <a href="https://redirect.github.com/Pylons/waitress/issues/432">Pylons/waitress#432</a></p> </li> <li> <p>Fix a race condition in Waitress when <code>channel_request_lookahead</code> is enabled<br /> that could lead to HTTP request smuggling.</p> <p>See <a href="https://github.com/Pylons/waitress/security/advisories/GHSA-9298-4cf8-g4wj">https://github.com/Pylons/waitress/security/advisories/GHSA-9298-4cf8-g4wj</a></p> </li> </ul> <h2>3.0.0 (2024-02-04)</h2> <ul> <li> <p>Rename "master" git branch to "main"</p> </li> <li> <p>Fix a bug that would appear on macOS whereby if we accept() a socket that is<br /> already gone, setting socket options would fail and take down the server. See<br /> <a href="https://redirect.github.com/Pylons/waitress/pull/399">Pylons/waitress#399</a></p> </li> <li> <p>Fixed testing of vendored asyncore code to not rely on particular naming for<br /> errno's. See <a href="https://redirect.github.com/Pylons/waitress/pull/397">Pylons/waitress#397</a></p> </li> <li> <p>HTTP Request methods and versions are now validated to meet the HTTP<br /> standards thereby dropping invalid requests on the floor. See<br /> <a href="https://redirect.github.com/Pylons/waitress/pull/423">Pylons/waitress#423</a></p> </li> <li> <p>No longer close the connection when sending a HEAD request response. See<br /> <a href="https://redirect.github.com/Pylons/waitress/pull/428">Pylons/waitress#428</a></p> </li> <li> <p>Always attempt to send the Connection: close response header when we are<br /> going to close the connection to let the remote know in more instances.<br /> <a href="https://redirect.github.com/Pylons/waitress/pull/429">Pylons/waitress#429</a></p> </li> <li> <p>Python 3.7 is no longer supported. Add support for Python 3.11, 3.12 and<br /> PyPy 3.9, 3.10. See <a href="https://redirect.github.com/Pylons/waitress/pull/412">Pylons/waitress#412</a></p> </li> </ul> <p></tr></table><br /> </code></pre></p> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/Pylons/waitress/commit/ae949bb428e50cf04152db56460f31c1e6d3a2a9"><code>ae949bb</code></a> Ready for 3.0.1</li> <li><a href="https://github.com/Pylons/waitress/commit/e4359018537af376cf24bd13616d861e2fb76f65"><code>e435901</code></a> Merge commit from fork</li> <li><a href="https://github.com/Pylons/waitress/commit/810a435f9e9e293bd3446a5ce2df86f59c4e7b1b"><code>810a435</code></a> Add documentation for channel_request_lookahead</li> <li><a href="https://github.com/Pylons/waitress/commit/f4ba1c260cf17156b582c6252496213ddc96b591"><code>f4ba1c2</code></a> Fix a race condition on recv_bytes boundary when request is invalid</li> <li><a href="https://github.com/Pylons/waitress/commit/7e7f11e61d358ab1cb853fcadf2b46b1f00f5993"><code>7e7f11e</code></a> Add a new test to validate the lookahead race condition</li> <li><a href="https://github.com/Pylons/waitress/commit/6943dcf556610ece2ff3cddb39e59a05ef110661"><code>6943dcf</code></a> Make DummySock() look more like an actual socket</li> <li><a href="https://github.com/Pylons/waitress/commit/fdd2ecfd325af2f419d91c62b2551e2c3922f686"><code>fdd2ecf</code></a> Merge pull request <a href="https://redirect.github.com/Pylons/waitress/issues/445">#445</a> from Pylons/feature/support-py-3-13</li> <li><a href="https://github.com/Pylons/waitress/commit/dcd18e7b4b8e78e2abea8f286c23b0b9298bea9b"><code>dcd18e7</code></a> Update exclude matrix</li> <li><a href="https://github.com/Pylons/waitress/commit/4633ea6d69d6b7eff5db91e263ea85f437026db0"><code>4633ea6</code></a> Drop Python 3.8 and add Python 3.13</li> <li><a href="https://github.com/Pylons/waitress/commit/4584936eac5838b6d3b07e84a86874fa586ffe6e"><code>4584936</code></a> Merge pull request <a href="https://redirect.github.com/Pylons/waitress/issues/440">#440</a> from Pylons/fix/ci</li> <li>Additional commits viewable in <a href="https://github.com/Pylons/waitress/compare/v2.1.2...v3.0.1">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=waitress&package-manager=pip&previous-version=2.1.2&new-version=3.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/uktrade/jml/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Could this be backported to the 2.x line? |
The only bw-incompat change in waitress was changing |
Ah, thanks for the useful pointer! It seems that that’s an easy thing to configue properly. Then I can go back to fighting over redis ssl connection with self-signed certificate 🥲👍🏽 |
trying to solve load issues Pylons/waitress#418
Following on from debugging in this issue - collective/haufe.requestmonitoring#15
What we see is waitress switching into 100% CPU and staying there. It is happening in production randomly (within a week) and we haven't traced it back to a certain request).
Using a sampling profiler on waitress with 2 threads (in prod) we identified the thread using the CPU as the mainthread (top -H) and this is the profile. Note that since this is prod there are other requests so not all activity is related to the looping causing this bug.
from profiling it looks like channel is writable but the channel.connected == False.
So then it goes into a loop without writing or closing since it never actually does anything to the socket.
https://github.com/Pylons/waitress/blob/main/src/waitress/channel.py#L98
EDIT: My suspicion would be that what started this was a client that shutdown (half) very quickly after a connect and this happened before the dispatcher finished being setup. This causes getpeername to fail with EINVAL and connected = False.
waitress/src/waitress/wasyncore.py
Line 310 in 4f6789b
Could be same issue as #411 but hard to tell.
One fix in #419 but could be better ways?
The text was updated successfully, but these errors were encountered: