Skip to content

100% cpu in mainthread due to not closing properly? (channel.connected == False) #418

@djay

Description

@djay

Following on from debugging in this issue - collective/haufe.requestmonitoring#15

What we see is waitress switching into 100% CPU and staying there. It is happening in production randomly (within a week) and we haven't traced it back to a certain request).

Using a sampling profiler on waitress with 2 threads (in prod) we identified the thread using the CPU as the mainthread (top -H) and this is the profile. Note that since this is prod there are other requests so not all activity is related to the looping causing this bug.

 Austin  TUI   Wall Time Profile                                                                                             CPU  99% ▇█▇█▇▇▇▇   MEM 263M ████████    5/5
   _________   Command /app/bin/python3.9 /app/parts/instance/bin/interpreter /app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py /app/parts/instance/etc/wsgi.ini
   ⎝__⎠ ⎝__⎠   Python 3.9.0     PID 3351466     PID:TID 3351466:11          
               Samples 1451365  ⏲️   3'16"       Threshold 0%   
  OWN    TOTAL    %OWN   %TOTAL  FUNCTION                                                                                                                                     
  00"    3'16"     0.0%  100.0%  └─ <module> (/app/parts/instance/bin/interpreter:326)                                                                                       ▒
  00"    3'16"     0.0%  100.0%     └─ <module> (/app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py:255)                                                                    ▒
  00"    3'16"     0.0%  100.0%        └─ main (/app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py:251)                                                                     ▒
  00"    3'16"     0.0%  100.0%           └─ run (/app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py:217)                                                                   │
  00"    3'16"     0.0%  100.0%              └─ serve (/app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py:203)                                                              │
  00"    3'16"     0.0%  100.0%                 └─ serve (/app/eggs/plone.recipe.zope2instance-6.11.0-py3.9.egg/plone/recipe/zope2instance/ctl.py:942)                       │
  00"    3'16"     0.0%  100.0%                    └─ serve_paste (/app/eggs/plone.recipe.zope2instance-6.11.0-py3.9.egg/plone/recipe/zope2instance/ctl.py:917)              │
  00"    3'16"     0.0%  100.0%                       └─ serve (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/__init__.py:19)                                                  │
  00"    3'16"     0.0%  100.0%                          └─ run (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/server.py:322)                                                  │
  05"    3'16"     2.5%   99.9%                             ├─ loop (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:245)                                           │
  36"     44"     18.3%   22.4%                             │  ├─ poll (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:158)                                        │
  05"     05"      2.4%    2.4%                             │  │  ├─ readable (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/server.py:290)                                    │
  01"     02"      0.4%    0.9%                             │  │  ├─ write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:117)                                    │
  01"     01"      0.4%    0.5%                             │  │  │  ├─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:517)                    │
  00"     00"      0.0%    0.0%                             │  │  │  │  ├─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.0%                             │  │  │  │  └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:95)                          │
  00"     00"      0.0%    0.0%                             │  │  │  ├─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:514)                    │
  00"     00"      0.0%    0.0%                             │  │  │  ├─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:515)                    │
  00"     00"      0.0%    0.0%                             │  │  │  │  └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.0%                             │  │  │  ├─ poll (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:150)                                  │
  00"     00"      0.0%    0.0%                             │  │  │  │  └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.0%                             │  │  │  └─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:509)                    │
  00"     00"      0.0%    0.0%                             │  │  │     └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.1%                             │  │  ├─ write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:113)                                    │
  00"     00"      0.1%    0.1%                             │  │  │  ├─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:517)                    │
  00"     00"      0.0%    0.0%                             │  │  │  │  └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.0%                             │  │  │  └─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:509)                    │
  00"     00"      0.1%    0.1%                             │  │  ├─ readable (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:154)   

from profiling it looks like channel is writable but the channel.connected == False.
So then it goes into a loop without writing or closing since it never actually does anything to the socket.
https://github.com/Pylons/waitress/blob/main/src/waitress/channel.py#L98

EDIT: My suspicion would be that what started this was a client that shutdown (half) very quickly after a connect and this happened before the dispatcher finished being setup. This causes getpeername to fail with EINVAL and connected = False.

self.connected = False

            try:
                self.addr = sock.getpeername()
            except OSError as err:
                if err.args[0] in (ENOTCONN, EINVAL):
                    # To handle the case where we got an unconnected
                    # socket.
                    self.connected = False
                else:
                    # The socket is broken in some unknown way, alert
                    # the user and remove it from the map (to prevent
                    # polling of broken sockets).
                    self.del_channel(map)
                    raise

Could be same issue as #411 but hard to tell.

One fix in #419 but could be better ways?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions