-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(futures): fix incorrect context propgation with ThreadPoolExecutor #9588
Conversation
BenchmarksBenchmark execution time: 2024-06-20 18:20:37 Comparing candidate commit 5b8e7e5 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 187 metrics, 9 unstable metrics. |
Datadog ReportBranch report: ✅ 0 Failed, 576 Passed, 330 Skipped, 2m 44.42s Total duration (1m 4.24s time saved) |
#9588) There is a bug when scheduling work onto a `ThreadPoolExecutor` and not waiting for the response (e.g. `pool.submit(work)`, and ignoring the future) we not properly associate the spans created in the task with the trace that was active when submitting the task. The reason for this bug is because we propagate the currently active span (parent) to the child task, however, if the parent span finishes before the child task can create it's first span, we no longer consider the parent span active/available to inherit from. This is because our context management code does not work if passing spans between thread or process boundaries. The solution is to instead pass the active span's Context to the child task. This is a similar process as passing context between two services/processes via HTTP headers (for example). This change will allow the child task's spans to be properly associated with the parent span regardless of the execution order. This issue can be highlighted by the following example: ```python pool = ThreadPoolExecutor(max_workers=1) def task(): parent_span = tracer.current_span() assert parent_span is not None time.sleep(1) with tracer.trace("parent"): for _ in range(10): pool.submit(task) ``` The first execution of `task` will (probably) succeed without any issues because the parent span is likely still active at that time. However, when each additional task executes the assertion will fail because the parent span is no longer an active span so `tracer.current_span()` will return `None`. This example shows that only the first execution of `task` will be properly associated with the parent span/trace, the other calls to `task` will be disconnected traces. This fix will resolve this inconsistent and unexpected behavior to ensure that the spans created in `task` will always be properly associated with the parent span/trace. This change may impact people who were expecting to access the current span in the child task, but before creating any spans in the child task (the code sample above), as the span will no longer be available via `tracer.current_span()`. ## Checklist - [x] Change(s) are motivated and described in the PR description - [x] Testing strategy is described if automated tests are not included in the PR - [x] Risks are described (performance impact, potential for breakage, maintainability) - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed or label `changelog/no-changelog` is set - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)) - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [x] If this PR changes the public interface, I've notified `@DataDog/apm-tees`. ## Reviewer Checklist - [x] Title is accurate - [x] All changes are related to the pull request's stated goal - [x] Description motivates each change - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - [x] Testing strategy adequately addresses listed risks - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] Release note makes sense to a user of the library - [x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) (cherry picked from commit 109ba08)
…r [backport 2.9] (#9603) Backport 109ba08 from #9588 to 2.9. There is a bug when scheduling work onto a `ThreadPoolExecutor` and not waiting for the response (e.g. `pool.submit(work)`, and ignoring the future) we not properly associate the spans created in the task with the trace that was active when submitting the task. The reason for this bug is because we propagate the currently active span (parent) to the child task, however, if the parent span finishes before the child task can create it's first span, we no longer consider the parent span active/available to inherit from. This is because our context management code does not work if passing spans between thread or process boundaries. The solution is to instead pass the active span's Context to the child task. This is a similar process as passing context between two services/processes via HTTP headers (for example). This change will allow the child task's spans to be properly associated with the parent span regardless of the execution order. This issue can be highlighted by the following example: ```python pool = ThreadPoolExecutor(max_workers=1) def task(): parent_span = tracer.current_span() assert parent_span is not None time.sleep(1) with tracer.trace("parent"): for _ in range(10): pool.submit(task) ``` The first execution of `task` will (probably) succeed without any issues because the parent span is likely still active at that time. However, when each additional task executes the assertion will fail because the parent span is no longer an active span so `tracer.current_span()` will return `None`. This example shows that only the first execution of `task` will be properly associated with the parent span/trace, the other calls to `task` will be disconnected traces. This fix will resolve this inconsistent and unexpected behavior to ensure that the spans created in `task` will always be properly associated with the parent span/trace. This change may impact people who were expecting to access the current span in the child task, but before creating any spans in the child task (the code sample above), as the span will no longer be available via `tracer.current_span()`. ## Checklist - [x] Change(s) are motivated and described in the PR description - [x] Testing strategy is described if automated tests are not included in the PR - [x] Risks are described (performance impact, potential for breakage, maintainability) - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed or label `changelog/no-changelog` is set - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)) - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [x] If this PR changes the public interface, I've notified `@DataDog/apm-tees`. ## Reviewer Checklist - [x] Title is accurate - [x] All changes are related to the pull request's stated goal - [x] Description motivates each change - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - [x] Testing strategy adequately addresses listed risks - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] Release note makes sense to a user of the library - [x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
…9656) This change updates the suitespec to run the tornado suite on pull requests that change `contrib/futures`. This is necessary because #9588 introduced a [failure](https://app.circleci.com/pipelines/github/DataDog/dd-trace-py/64311/workflows/c67de0aa-58b3-42ca-8f52-68d6ea9b45a8/jobs/3979086) in that suite that was only discovered post-merge. ## Checklist - [x] Change(s) are motivated and described in the PR description - [x] Testing strategy is described if automated tests are not included in the PR - [x] Risks are described (performance impact, potential for breakage, maintainability) - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed or label `changelog/no-changelog` is set - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)) - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [x] If this PR changes the public interface, I've notified `@DataDog/apm-tees`. ## Reviewer Checklist - [x] Title is accurate - [x] All changes are related to the pull request's stated goal - [x] Description motivates each change - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - [x] Testing strategy adequately addresses listed risks - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] Release note makes sense to a user of the library - [x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
There is a bug when scheduling work onto a
ThreadPoolExecutor
and not waiting for the response (e.g.pool.submit(work)
, and ignoring the future) we not properly associate the spans created in the task with the trace that was active when submitting the task.The reason for this bug is because we propagate the currently active span (parent) to the child task, however, if the parent span finishes before the child task can create it's first span, we no longer consider the parent span active/available to inherit from. This is because our context management code does not work if passing spans between thread or process boundaries.
The solution is to instead pass the active span's Context to the child task. This is a similar process as passing context between two services/processes via HTTP headers (for example). This change will allow the child task's spans to be properly associated with the parent span regardless of the execution order.
This issue can be highlighted by the following example:
The first execution of
task
will (probably) succeed without any issues because the parent span is likely still active at that time. However, when each additional task executes the assertion will fail because the parent span is no longer an active span sotracer.current_span()
will returnNone
.This example shows that only the first execution of
task
will be properly associated with the parent span/trace, the other calls totask
will be disconnected traces.This fix will resolve this inconsistent and unexpected behavior to ensure that the spans created in
task
will always be properly associated with the parent span/trace.This change may impact people who were expecting to access the current span in the child task, but before creating any spans in the child task (the code sample above), as the span will no longer be available via
tracer.current_span()
.Checklist
changelog/no-changelog
is set@DataDog/apm-tees
.Reviewer Checklist