Skip to content

Massive thread accumulation on blocking calls #147

@azertyfun

Description

@azertyfun

Hi,

We are noticing an issue with thread accumulation in kong in our production setup. Our python plugin ends up accumulating hundreds upon hundreds of threads, until it eventually hits the container limit and kong gets killed.

Performing a core dump on our plugin process, I got the following:

(gdb) thread apply all py-bt

Thread 490 (Thread 0x7f902b93e640 (LWP 1340)):
warning: Couldn't find general-purpose registers in core file.
Unable to locate python frame

Thread 489 (Thread 0x7f902b13d640 (LWP 1341)):
warning: Couldn't find general-purpose registers in core file.
Unable to locate python frame

Thread 488 (Thread 0x7f902a03a640 (LWP 1344)):
warning: Couldn't find general-purpose registers in core file.
Unable to locate python frame
[SNIP]

Thread 246 (Thread 0x7f902d1ec1c0 (LWP 1339)):
warning: Couldn't find general-purpose registers in core file.
Unable to locate python frame

Thread 245 (LWP 2006945):
Traceback (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 46, in read
    return self.socket.recv(n)
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 59, in handle
    for _, msgid, method, args in unpacker:
  File "/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/lib/python3.10/socketserver.py", line 683, in process_request_thread
    self.finish_request(request, client_address)
  File "/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()

Thread 244 (LWP 2000762):
Traceback (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 46, in read
    return self.socket.recv(n)
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 59, in handle
    for _, msgid, method, args in unpacker:
  File "/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/lib/python3.10/socketserver.py", line 683, in process_request_thread
    self.finish_request(request, client_address)
  File "/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()

Thread 243 (LWP 2000750):
Traceback (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 46, in read
    return self.socket.recv(n)
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 59, in handle
    for _, msgid, method, args in unpacker:
  File "/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/lib/python3.10/socketserver.py", line 683, in process_request_thread
    self.finish_request(request, client_address)
  File "/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()

Thread 242 (LWP 2000746):
Traceback (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 46, in read
    return self.socket.recv(n)
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 59, in handle
    for _, msgid, method, args in unpacker:
  File "/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/lib/python3.10/socketserver.py", line 683, in process_request_thread
    self.finish_request(request, client_address)
  File "/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()

Thread 241 (LWP 2000744):
Traceback (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 46, in read
    return self.socket.recv(n)
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 59, in handle
    for _, msgid, method, args in unpacker:
  File "/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/lib/python3.10/socketserver.py", line 683, in process_request_thread
    self.finish_request(request, client_address)
  File "/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()

Thread 240 (LWP 2000734):
Traceback (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 46, in read
    return self.socket.recv(n)
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 59, in handle
    for _, msgid, method, args in unpacker:
  File "/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/lib/python3.10/socketserver.py", line 683, in process_request_thread
    self.finish_request(request, client_address)
  File "/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()

Thread 239 (LWP 1988088):
Traceback (most recent call first):
  File "/lib/python3.10/threading.py", line 320, in wait
    waiter.acquire()
  File "/lib/python3.10/queue.py", line 171, in get
    self.not_empty.wait()
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/pdk/__init__.py", line 41, in f
    data, err = ch.get()
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/pdk/__init__.py", line 55, in bridge
    return self.rpc(attr, *args)
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/pdk/__init__.py", line 9, in __call__
    return self.call(self.prefix, *a)
  File "/opt/kong-plugins/REDACTED_PLUGIN_NAME.py", line 175, in redacted_function
    kong.log.debug(f'REDACTED some log line')
  File "/usr/local/lib/python3.10/dist-packages/prometheus_client/context_managers.py", line 80, in wrapped
    return func(*args, **kwargs)
  File "<decorator-gen-5>", line 2, in redacted_function
  File "/opt/kong-plugins/REDACTED_PLUGIN_NAME.py", line 420, in access
    redacted_result = self.redacted_function(
  File "/usr/local/lib/python3.10/dist-packages/prometheus_client/context_managers.py", line 80, in wrapped
    return func(*args, **kwargs)
  File "<decorator-gen-8>", line 2, in access
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/server.py", line 62, in _handler_event_func
    cls_phase(Kong(ch, lua_style).kong)
  File "/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()

Thread 238 (LWP 1985734):
Traceback (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 46, in read
    return self.socket.recv(n)
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/listener.py", line 59, in handle
    for _, msgid, method, args in unpacker:
  File "/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/lib/python3.10/socketserver.py", line 683, in process_request_thread
    self.finish_request(request, client_address)
  File "/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()


Thread 237 (LWP 1979062):
Traceback (most recent call first):
  File "/lib/python3.10/threading.py", line 320, in wait
    waiter.acquire()
  File "/lib/python3.10/queue.py", line 171, in get
    self.not_empty.wait()
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/pdk/__init__.py", line 41, in f
    data, err = ch.get()
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/pdk/__init__.py", line 55, in bridge
    return self.rpc(attr, *args)
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/pdk/__init__.py", line 9, in __call__
    return self.call(self.prefix, *a)
  File "/opt/kong-plugins/REDACTED_PLUGIN_NAME.py", line 175, in redacted_function
    kong.log.debug(f'REDACTED some log line')
  File "/usr/local/lib/python3.10/dist-packages/prometheus_client/context_managers.py", line 80, in wrapped
    return func(*args, **kwargs)
  File "<decorator-gen-5>", line 2, in redacted_function
  File "/opt/kong-plugins/REDACTED_PLUGIN_NAME.py", line 420, in access
    redacted_result = self.redacted_function(
  File "/usr/local/lib/python3.10/dist-packages/prometheus_client/context_managers.py", line 80, in wrapped
    return func(*args, **kwargs)
  File "<decorator-gen-8>", line 2, in access
  File "/usr/local/lib/python3.10/dist-packages/kong_pdk/server.py", line 62, in _handler_event_func
    cls_phase(Kong(ch, lua_style).kong)
  File "/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()

[SNIP]

The exact root cause is as yet unknown. The threads all block either on self.socket.recv(n) or (more puzzlingly) on the IPC's ch.get().

This only happens to a tiny fraction of requests so the thread growth is only a few hundred per day despite us seeing well over 100 req/s at busy times. Still, over several days this can cause us to reach the maximum thread limit for a single k8s container. One can speculate that some race condition is happening under specific circumstances (perhaps when a request gets cancelled midway?).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions