Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(telemetry): support python3.12 #7043

Merged
merged 14 commits into from
Sep 26, 2023

Conversation

mabdinur
Copy link
Contributor

@mabdinur mabdinur commented Sep 25, 2023

Description

This PR ensures the instrumentation telemetry client is compatible with python3.12's threading module by:

  • Ensuring the telemetry writer thread is only started once per application (and app-started is sent once).
  • Ensuring the telemetry writer thread is disabled when an application is shutdown.
  • Ensuring telemetry metrics are queued SpanAggregator.shutdown without restarting the telemetry writer thread.

Motivation

The following change failed to support the telemetry client in python3.12: #6859. This PR will hopefully fix this 🤞.

Reproduction for python3.12 runtime errors

docker run --rm -it python:3.12.0rc3 bash
root@a1f3c1d307ec:/# pip install ddtrace==2.0.0rc2
root@a1f3c1d307ec:/# python -c "import ddtrace; _ = ddtrace.tracer.trace('foo'); raise Exception"

Output

Traceback (most recent call last):
  File "<string>", line 1, in <module>
Exception
Exception ignored in atexit callback: <bound method Tracer._atexit of <ddtrace.tracer.Tracer object at 0xffffbd967260>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/ddtrace/tracer.py", line 293, in _atexit
    self.shutdown(timeout=self.SHUTDOWN_TIMEOUT)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/tracer.py", line 1024, in shutdown
    processor.shutdown(timeout)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/processor/trace.py", line 270, in shutdown
    self._queue_span_count_metrics("spans_created", "integration_name", None)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/processor/trace.py", line 291, in _queue_span_count_metrics
    telemetry_writer.add_count_metric(
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/telemetry/writer.py", line 514, in add_count_metric
    if self.status == ServiceStatus.RUNNING or self.enable():
                                               ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/telemetry/writer.py", line 218, in enable
    self.start()
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/service.py", line 58, in start
    self._start_service(*args, **kwargs)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/periodic.py", line 135, in _start_service
    self._worker.start()
  File "/usr/local/lib/python3.12/threading.py", line 971, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't create new thread at interpreter shutdown

Risk

This PR reverts an optimization that ensured telemetry span creation metrics were queued in batches of 100. Without this optimization we can expect a 5-10% increase to span creation and span finish.

Checklist

  • Change(s) are motivated and described in the PR description.
  • Testing strategy is described if automated tests are not included in the PR.
  • Risk is outlined (performance impact, potential for breakage, maintainability, etc).
  • Change is maintainable (easy to change, telemetry, documentation).
  • Library release note guidelines are followed. If no release note is required, add label changelog/no-changelog.
  • Documentation is included (in-code, generated user docs, public corp docs).
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Title is accurate.
  • No unnecessary changes are introduced.
  • Description motivates each change.
  • Avoids breaking API changes unless absolutely necessary.
  • Testing strategy adequately addresses listed risk(s).
  • Change is maintainable (easy to change, telemetry, documentation).
  • Release note makes sense to a user of the library.
  • Reviewer has explicitly acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment.
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy
  • If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

@mabdinur mabdinur marked this pull request as ready for review September 25, 2023 21:07
@mabdinur mabdinur requested a review from a team as a code owner September 25, 2023 21:07
emmettbutler
emmettbutler previously approved these changes Sep 25, 2023
@emmettbutler emmettbutler added changelog/no-changelog A changelog entry is not required for this PR. backport 2.0 labels Sep 25, 2023
@mabdinur mabdinur enabled auto-merge (squash) September 25, 2023 21:24
ZStriker19
ZStriker19 previously approved these changes Sep 25, 2023
@DataDog DataDog deleted a comment from pr-commenter bot Sep 26, 2023
@majorgreys majorgreys added this to the v2.0.0 milestone Sep 26, 2023
emmettbutler
emmettbutler previously approved these changes Sep 26, 2023
@pr-commenter
Copy link

pr-commenter bot commented Sep 26, 2023

Benchmarks

Benchmark execution time: 2023-09-26 17:14:27

Comparing candidate commit 28adebd in PR branch munir/telemetry-py3.12-shutdown-exception with baseline commit 7ccb806 in branch 2.x.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 90 metrics, 0 unstable metrics.

@mabdinur mabdinur merged commit 065784b into 2.x Sep 26, 2023
133 of 134 checks passed
@mabdinur mabdinur deleted the munir/telemetry-py3.12-shutdown-exception branch September 26, 2023 18:11
github-actions bot pushed a commit that referenced this pull request Sep 26, 2023
## Description

This PR ensures the instrumentation telemetry client is compatible with
python3.12's threading module by:
- Removing `telemetry_writer.add_count_metric` from
`SpanAggregator.shutdown`. This ensures telemetry events are not queued
on tracer shutdown.
- Ensuring the telemetry writer thread is only started once per
application.
- Ensures the telemetry writer thread is disabled when an application is
shutdown.

## Motivation

The following change failed to support the telemetry client in
python3.12: #6859. This PR
will hopefully fix this 🤞.

### Reproduction for python3.12 runtime errors
```
docker run --rm -it python:3.12.0rc3 bash
root@a1f3c1d307ec:/# pip install ddtrace==2.0.0rc2
root@a1f3c1d307ec:/# python -c "import ddtrace; _ = ddtrace.tracer.trace('foo'); raise Exception"
```

### Output
```
Traceback (most recent call last):
  File "<string>", line 1, in <module>
Exception
Exception ignored in atexit callback: <bound method Tracer._atexit of <ddtrace.tracer.Tracer object at 0xffffbd967260>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/ddtrace/tracer.py", line 293, in _atexit
    self.shutdown(timeout=self.SHUTDOWN_TIMEOUT)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/tracer.py", line 1024, in shutdown
    processor.shutdown(timeout)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/processor/trace.py", line 270, in shutdown
    self._queue_span_count_metrics("spans_created", "integration_name", None)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/processor/trace.py", line 291, in _queue_span_count_metrics
    telemetry_writer.add_count_metric(
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/telemetry/writer.py", line 514, in add_count_metric
    if self.status == ServiceStatus.RUNNING or self.enable():
                                               ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/telemetry/writer.py", line 218, in enable
    self.start()
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/service.py", line 58, in start
    self._start_service(*args, **kwargs)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/periodic.py", line 135, in _start_service
    self._worker.start()
  File "/usr/local/lib/python3.12/threading.py", line 971, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't create new thread at interpreter shutdown
```

## Risk

This PR reverts an optimization that ensured telemetry span creation
metrics were queued in batches of 100. Without this optimization we can
expect a 2-5% increase to span creation and span finish.

## Checklist

- [x] Change(s) are motivated and described in the PR description.
- [x] Testing strategy is described if automated tests are not included
in the PR.
- [x] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.

(cherry picked from commit 065784b)
@github-actions github-actions bot modified the milestones: v2.0.0, v2.1.0 Sep 26, 2023
mabdinur added a commit that referenced this pull request Sep 26, 2023
Backport 065784b from #7043 to 2.0.

## Description

This PR ensures the instrumentation telemetry client is compatible with
python3.12's threading module by:
- Ensuring the telemetry writer thread is only started once per
application.
- Ensures the telemetry writer thread is disabled when an application is
shutdown.
- Ensures telemetry metrics are queued `SpanAggregator.shutdown` without
restarting the telemetry writer thread.

## Motivation

The following change failed to support the telemetry client in
python3.12: #6859. This PR
will hopefully fix this 🤞.

### Reproduction for python3.12 runtime errors
```
docker run --rm -it python:3.12.0rc3 bash
root@a1f3c1d307ec:/# pip install ddtrace==2.0.0rc2
root@a1f3c1d307ec:/# python -c "import ddtrace; _ = ddtrace.tracer.trace('foo'); raise Exception"
```

### Output
```
Traceback (most recent call last):
  File "<string>", line 1, in <module>
Exception
Exception ignored in atexit callback: <bound method Tracer._atexit of <ddtrace.tracer.Tracer object at 0xffffbd967260>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/ddtrace/tracer.py", line 293, in _atexit
    self.shutdown(timeout=self.SHUTDOWN_TIMEOUT)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/tracer.py", line 1024, in shutdown
    processor.shutdown(timeout)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/processor/trace.py", line 270, in shutdown
    self._queue_span_count_metrics("spans_created", "integration_name", None)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/processor/trace.py", line 291, in _queue_span_count_metrics
    telemetry_writer.add_count_metric(
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/telemetry/writer.py", line 514, in add_count_metric
    if self.status == ServiceStatus.RUNNING or self.enable():
                                               ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/telemetry/writer.py", line 218, in enable
    self.start()
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/service.py", line 58, in start
    self._start_service(*args, **kwargs)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/periodic.py", line 135, in _start_service
    self._worker.start()
  File "/usr/local/lib/python3.12/threading.py", line 971, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't create new thread at interpreter shutdown
```

## Risk

~~This PR reverts an optimization that ensured telemetry span creation
metrics were queued in batches of 100. Without this optimization we can
expect a 5-10% increase to span creation and span finish.~~


## Checklist

- [x] Change(s) are motivated and described in the PR description.
- [x] Testing strategy is described if automated tests are not included
in the PR.
- [x] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.

Co-authored-by: Munir Abdinur <munir.abdinur@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog A changelog entry is not required for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants