-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(gevent): fix gevent compatibility with "module cloning" approach #4863
Conversation
c52a9ea
to
2f3659c
Compare
fcb041b
to
c39d58f
Compare
f0158f0
to
d47c047
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to generated docs for config page changes: https://output.circle-artifacts.com/output/job/d89b9a6d-a048-408a-9406-92bd20e098fb/artifacts/0/tmp/docs/configuration.html#DD_UNLOAD_MODULES_FROM_SITECUSTOMIZE
(I don't see issues, just sharing)
Same for release notes (you have to go to _release_notes_all.html
to see the release note as part of open PR)
(again, no issues, just showing)
The only real test changes we need here are that this feature when enabled works with gunicorn/gevent in some way because it is off by default in theory our other tests should cover that the existing behavior doesn't change. This is correct? I assume we may want/need more tests added before enabling by default (or "auto"), but I didn't miss any other coverage that this needs to have now, right? |
@brettlangdon that's right! |
my only real argument to performance testing it now (even just locally with some basic examples) is we are documenting it as a feature for customers, so if the performance was really bad we cannot just yank the feature entirely, we'd have to deprecate it and wait for 2.0 to remove it. yeah, that is the drastic scenario, but good to consider if it ever was the case. although, if the performance was bad, we could keep the feature since it would solve some customer problems, but we could document the performance impact of it and not enable it by default. so.... 🤷🏻 |
@brettlangdon yup, I follow your thinking. I'm inclined to wait on performance tests until we're considering enabling module cloning by default. |
# than the first line of the program. since that's what we're doing here, | ||
# we cleanup aggressively beforehand to replicate the conditions at program start | ||
# as closely as possible. | ||
cleanup_loaded_modules(aggressive=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we can be aggressive here because we still haven't loaded much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's probably true.
…5135) This PR makes a fix to #4863 which inadvertently removed unloading the `time` module as well as removing the `typing` module from the list of standard library modules to not unload for module cloning. We found that the `time` module is a special case (which is important to unload due to interactions with gevent patching) as it is implicitly imported by recent versions of CPython on interpreter startup, so it may be present in `sys.modules` already. The `typing` module is also problematic on being reloaded for older versions of Python < 3.7, so we should not unload/reload this module as well.
This change ensures that the concurrent.futures submodule is unloaded when doing module unloading (e.g. when gevent is detected). This ensures that the module is reloaded when needed and uses the potential new state of the cloned threading module. ## Issue insight The captured stacks for all the processes during the thread pool deadlock look as follows ~~~ 💤🧒 Process 82315 🧵 Thread 4312286592 Thread._bootstrap (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/threading.py:995) Thread._bootstrap_inner (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/threading.py:1038) Thread.run (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/threading.py:975) _worker (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/concurrent/futures/thread.py:83) _WorkItem.run (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/concurrent/futures/thread.py:58) _wrap_execution (/Users/gabriele.tornetta/p403n1x87/dd-trace-py/ddtrace/contrib/futures/threading.py:43) Task.__call__ (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/tasks.py:139) Task._execute_main (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/tasks.py:162) SubmissionTask._main (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/tasks.py:269) UploadSubmissionTask._submit (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/upload.py:591) UploadSubmissionTask._submit_upload_request (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/upload.py:626) TransferCoordinator.submit (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/futures.py:323) BoundedExecutor.submit (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/futures.py:474) _wrap_submit (/Users/gabriele.tornetta/p403n1x87/dd-trace-py/ddtrace/contrib/futures/threading.py:30) ThreadPoolExecutor.submit (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/concurrent/futures/thread.py:162) 💤 Process 81968 🧵 Thread 4312286592 <module> (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/bin/gunicorn:8) run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py:67) Application.run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/app/base.py:231) BaseApplication.run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/app/base.py:72) Arbiter.run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/arbiter.py:209) Arbiter.sleep (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/arbiter.py:357) ~~~ ## Implementation details The patching of `ThreadPoolExecutor.submit` has been refactored to be performed with the internal wrapping utilities instead of `wrapt`. One issue caused by the previous implementation was in the way `wrapt` used module paths to locate and patch the designated callable. Since the `concurrent` module is not being unloaded, any attempts to look up `futures` from `concurrent` would lead to the original reference being returned, even in cases where the reference to the cloned `concurrent.futures` is desired. _En passant_, the refactor has also brought to light a tolerance to passing a mandatory positional argument to `ThreadPoolExecutor.submit` via keyword arguments. ## Testing strategy The change includes a simple reproducing case that works with the fix in place. ## Risks The `concurrent` module has been added to the exclude list in the initial implementation of #4863 because of issues observed during test runs (in particular with framework tests). It is unclear whether the same issues would actually arise outside the CI context and in actual applications. However, because the `concurrent.futures` modules creates lock objects on initialisation, and since it is imported indirectly by `ddtrace` via its dependencies, the issue caused by the `futures` integration requires the module to be reloaded after the gevent monkey-patching. The compromise of this PR is to unload only the `concurrent.futures` sub-module (and its sub-modules), which seems enough to solve the original issue. Fixes #5398
…#5402) This change ensures that the concurrent.futures submodule is unloaded when doing module unloading (e.g. when gevent is detected). This ensures that the module is reloaded when needed and uses the potential new state of the cloned threading module. ## Issue insight The captured stacks for all the processes during the thread pool deadlock look as follows ~~~ 💤🧒 Process 82315 🧵 Thread 4312286592 Thread._bootstrap (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/threading.py:995) Thread._bootstrap_inner (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/threading.py:1038) Thread.run (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/threading.py:975) _worker (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/concurrent/futures/thread.py:83) _WorkItem.run (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/concurrent/futures/thread.py:58) _wrap_execution (/Users/gabriele.tornetta/p403n1x87/dd-trace-py/ddtrace/contrib/futures/threading.py:43) Task.__call__ (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/tasks.py:139) Task._execute_main (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/tasks.py:162) SubmissionTask._main (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/tasks.py:269) UploadSubmissionTask._submit (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/upload.py:591) UploadSubmissionTask._submit_upload_request (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/upload.py:626) TransferCoordinator.submit (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/futures.py:323) BoundedExecutor.submit (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/futures.py:474) _wrap_submit (/Users/gabriele.tornetta/p403n1x87/dd-trace-py/ddtrace/contrib/futures/threading.py:30) ThreadPoolExecutor.submit (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/concurrent/futures/thread.py:162) 💤 Process 81968 🧵 Thread 4312286592 <module> (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/bin/gunicorn:8) run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py:67) Application.run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/app/base.py:231) BaseApplication.run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/app/base.py:72) Arbiter.run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/arbiter.py:209) Arbiter.sleep (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/arbiter.py:357) ~~~ ## Implementation details The patching of `ThreadPoolExecutor.submit` has been refactored to be performed with the internal wrapping utilities instead of `wrapt`. One issue caused by the previous implementation was in the way `wrapt` used module paths to locate and patch the designated callable. Since the `concurrent` module is not being unloaded, any attempts to look up `futures` from `concurrent` would lead to the original reference being returned, even in cases where the reference to the cloned `concurrent.futures` is desired. _En passant_, the refactor has also brought to light a tolerance to passing a mandatory positional argument to `ThreadPoolExecutor.submit` via keyword arguments. ## Testing strategy The change includes a simple reproducing case that works with the fix in place. ## Risks The `concurrent` module has been added to the exclude list in the initial implementation of DataDog#4863 because of issues observed during test runs (in particular with framework tests). It is unclear whether the same issues would actually arise outside the CI context and in actual applications. However, because the `concurrent.futures` modules creates lock objects on initialisation, and since it is imported indirectly by `ddtrace` via its dependencies, the issue caused by the `futures` integration requires the module to be reloaded after the gevent monkey-patching. The compromise of this PR is to unload only the `concurrent.futures` sub-module (and its sub-modules), which seems enough to solve the original issue. Fixes DataDog#5398
This change ensures that the concurrent.futures submodule is unloaded when doing module unloading (e.g. when gevent is detected). This ensures that the module is reloaded when needed and uses the potential new state of the cloned threading module. ## Issue insight The captured stacks for all the processes during the thread pool deadlock look as follows ~~~ 💤🧒 Process 82315 🧵 Thread 4312286592 Thread._bootstrap (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/threading.py:995) Thread._bootstrap_inner (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/threading.py:1038) Thread.run (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/threading.py:975) _worker (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/concurrent/futures/thread.py:83) _WorkItem.run (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/concurrent/futures/thread.py:58) _wrap_execution (/Users/gabriele.tornetta/p403n1x87/dd-trace-py/ddtrace/contrib/futures/threading.py:43) Task.__call__ (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/tasks.py:139) Task._execute_main (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/tasks.py:162) SubmissionTask._main (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/tasks.py:269) UploadSubmissionTask._submit (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/upload.py:591) UploadSubmissionTask._submit_upload_request (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/upload.py:626) TransferCoordinator.submit (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/futures.py:323) BoundedExecutor.submit (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/futures.py:474) _wrap_submit (/Users/gabriele.tornetta/p403n1x87/dd-trace-py/ddtrace/contrib/futures/threading.py:30) ThreadPoolExecutor.submit (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/concurrent/futures/thread.py:162) 💤 Process 81968 🧵 Thread 4312286592 <module> (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/bin/gunicorn:8) run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py:67) Application.run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/app/base.py:231) BaseApplication.run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/app/base.py:72) Arbiter.run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/arbiter.py:209) Arbiter.sleep (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/arbiter.py:357) ~~~ ## Implementation details The patching of `ThreadPoolExecutor.submit` has been refactored to be performed with the internal wrapping utilities instead of `wrapt`. One issue caused by the previous implementation was in the way `wrapt` used module paths to locate and patch the designated callable. Since the `concurrent` module is not being unloaded, any attempts to look up `futures` from `concurrent` would lead to the original reference being returned, even in cases where the reference to the cloned `concurrent.futures` is desired. _En passant_, the refactor has also brought to light a tolerance to passing a mandatory positional argument to `ThreadPoolExecutor.submit` via keyword arguments. ## Testing strategy The change includes a simple reproducing case that works with the fix in place. ## Risks The `concurrent` module has been added to the exclude list in the initial implementation of #4863 because of issues observed during test runs (in particular with framework tests). It is unclear whether the same issues would actually arise outside the CI context and in actual applications. However, because the `concurrent.futures` modules creates lock objects on initialisation, and since it is imported indirectly by `ddtrace` via its dependencies, the issue caused by the `futures` integration requires the module to be reloaded after the gevent monkey-patching. The compromise of this PR is to unload only the `concurrent.futures` sub-module (and its sub-modules), which seems enough to solve the original issue. Fixes #5398
This change ensures that the concurrent.futures submodule is unloaded when doing module unloading (e.g. when gevent is detected). This ensures that the module is reloaded when needed and uses the potential new state of the cloned threading module. ## Issue insight The captured stacks for all the processes during the thread pool deadlock look as follows ~~~ 💤🧒 Process 82315 🧵 Thread 4312286592 Thread._bootstrap (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/threading.py:995) Thread._bootstrap_inner (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/threading.py:1038) Thread.run (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/threading.py:975) _worker (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/concurrent/futures/thread.py:83) _WorkItem.run (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/concurrent/futures/thread.py:58) _wrap_execution (/Users/gabriele.tornetta/p403n1x87/dd-trace-py/ddtrace/contrib/futures/threading.py:43) Task.__call__ (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/tasks.py:139) Task._execute_main (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/tasks.py:162) SubmissionTask._main (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/tasks.py:269) UploadSubmissionTask._submit (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/upload.py:591) UploadSubmissionTask._submit_upload_request (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/upload.py:626) TransferCoordinator.submit (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/futures.py:323) BoundedExecutor.submit (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/s3transfer/futures.py:474) _wrap_submit (/Users/gabriele.tornetta/p403n1x87/dd-trace-py/ddtrace/contrib/futures/threading.py:30) ThreadPoolExecutor.submit (/Users/gabriele.tornetta/.pyenv/versions/3.11.2/lib/python3.11/concurrent/futures/thread.py:162) 💤 Process 81968 🧵 Thread 4312286592 <module> (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/bin/gunicorn:8) run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py:67) Application.run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/app/base.py:231) BaseApplication.run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/app/base.py:72) Arbiter.run (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/arbiter.py:209) Arbiter.sleep (/Users/gabriele.tornetta/p403n1x87/ddtrace-gevent-bug/.venv/lib/python3.11/site-packages/gunicorn/arbiter.py:357) ~~~ ## Implementation details The patching of `ThreadPoolExecutor.submit` has been refactored to be performed with the internal wrapping utilities instead of `wrapt`. One issue caused by the previous implementation was in the way `wrapt` used module paths to locate and patch the designated callable. Since the `concurrent` module is not being unloaded, any attempts to look up `futures` from `concurrent` would lead to the original reference being returned, even in cases where the reference to the cloned `concurrent.futures` is desired. _En passant_, the refactor has also brought to light a tolerance to passing a mandatory positional argument to `ThreadPoolExecutor.submit` via keyword arguments. ## Testing strategy The change includes a simple reproducing case that works with the fix in place. ## Risks The `concurrent` module has been added to the exclude list in the initial implementation of #4863 because of issues observed during test runs (in particular with framework tests). It is unclear whether the same issues would actually arise outside the CI context and in actual applications. However, because the `concurrent.futures` modules creates lock objects on initialisation, and since it is imported indirectly by `ddtrace` via its dependencies, the issue caused by the `futures` integration requires the module to be reloaded after the gevent monkey-patching. The compromise of this PR is to unload only the `concurrent.futures` sub-module (and its sub-modules), which seems enough to solve the original issue. Fixes #5398
This pull request fixes incompatibility between ddtrace and gunicorn (and probably other frameworks that depend on gevent). It does this by making ddtrace hold copies of the modules it loads before
gevent.monkey.patch_all()
has been called, thus avoiding conflicts between greenlets and real threads subject to startup race conditions."Module cloning", as we're calling this approach, is configurable by the
DD_UNLOAD_MODULES_FROM_SITECUSTOMIZE
environment variable, documented in the release note.The change accomplishes this by deleting the modules from
sys.modules
. This causes the Python interpreter to create new copies of the modules onimport
, not touching the copies originally imported before the deletion (as described in broad strokes here). The result aftersitecustomize.py
has executed is that ddtrace internal code holds references to modules that have not been monkeypatched by gevent, and application code holds references to modules that may have been monkeypatched by gevent. This means that ddtrace can maintain its real threads for things like remote config and profiling without needing to worry about whether they will turn into greenlets at some point.The clearest proof that this change improves the gevent+ddtrace compatibility story is the fact that this test passes.
Tips for Reviewers
The changes in this pull request to files other than
sitecustomize.py
are mostly intended to make existing tests pass under this new paradigm.sitecustomize
is the most important file in this diff to understand deeply for a review.Risks
The main risk inherent to this approach is that some module behaves in an unexpected way when copied. We're mitigating this risk in two ways. The first is by making sure all existing tests pass (obviously), and the second is by putting this new functionality behind a feature flag environment variable that defaults to retaining the current behavior form
1.x
.Checklist
logging
Reviewer Checklist