-
Notifications
You must be signed in to change notification settings - Fork 470
chore: add CPU usage and RSS SLO for macrobenchmarks #14521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 281 ± 5 ms. The average import time from base is: 285 ± 7 ms. The import time difference between this PR and base is: -3.8 ± 0.3 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsCandidate: PROF-12417-CPU-SLO (ab8b409) 🔵 No Baseline Data (24 suites)🔵 coreapiscenario - 12/12 (2 unstable)🔵 No baseline data available for this suite
|
TODO - describe
01792a5 to
e0ff3fb
Compare
taegyunkim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1100MB for RSS usage seems to be more than enough, given that all datapoints from the screenshot are < 980MB, was there a reasoning behind that?
TBH I didn't think about that as much. Dmytro suggested that value. I think RSS can fluctuate quite a bit more than CPU usage, though? IIRC the benchmark platform has done a bunch of stuff to make CPU usage measurements less noisy, but I'm not sure about RSS. So I'd be inclined to give a bit more head room. |
|
Looks good for this scenario. https://gitlab.ddbuild.io/DataDog/apm-reliability/dd-trace-py/-/jobs/1118617986 I think 215 % might be too tight and flake, I suggest 220%. The idea is to have SLO > 7% than SLI, but less than 20% < SLI, otherwise some warnings will be triggered. |
Fixes #14506 In Python 3.13 it seems we may eagerly start a coroutine as soon as it is created. This means in our integration when we `return func(*args, **kwargs)` if the result is a coroutine we may start executing, then if the caller is doing `await wrapped_func()` we'll get the `ValueError: coroutine already executing` error being raised. To reproduce you can follow the guide in #14521, or check the regression middleware added to the `test_asgi_200` test function. This only occurs with Python 3.13. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [ ] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Fixes #14506 In Python 3.13 it seems we may eagerly start a coroutine as soon as it is created. This means in our integration when we `return func(*args, **kwargs)` if the result is a coroutine we may start executing, then if the caller is doing `await wrapped_func()` we'll get the `ValueError: coroutine already executing` error being raised. To reproduce you can follow the guide in #14521, or check the regression middleware added to the `test_asgi_200` test function. This only occurs with Python 3.13. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [ ] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) (cherry picked from commit 941cf9c)
Fixes #14506 In Python 3.13 it seems we may eagerly start a coroutine as soon as it is created. This means in our integration when we `return func(*args, **kwargs)` if the result is a coroutine we may start executing, then if the caller is doing `await wrapped_func()` we'll get the `ValueError: coroutine already executing` error being raised. To reproduce you can follow the guide in #14521, or check the regression middleware added to the `test_asgi_200` test function. This only occurs with Python 3.13. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [ ] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) (cherry picked from commit 941cf9c)
PR #14205 introduced a significant performance regression, where the profiler created a thread which consumed an entire CPU core spinning in a tight loop. Our macrobenchmarks captured this increased CPU usage, but we didn't have any SLOs defined. This commit adds an SLO for CPU usage based on the typical CPU usage prior to that regression. This commit also adds an RSS SLO, again based on typical usage.
Backport 941cf9c from #14512 to 3.13. Fixes #14506 In Python 3.13 it seems we may eagerly start a coroutine as soon as it is created. This means in our integration when we `return func(*args, **kwargs)` if the result is a coroutine we may start executing, then if the caller is doing `await wrapped_func()` we'll get the `ValueError: coroutine already executing` error being raised. To reproduce you can follow the guide in #14521, or check the regression middleware added to the `test_asgi_200` test function. This only occurs with Python 3.13. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
PR #14205 introduced a significant performance regression, where the
profiler created a thread which consumed an entire CPU core spinning in
a tight loop. Our macrobenchmarks captured this increased CPU usage, but
we didn't have any SLOs defined. This commit adds an SLO for CPU usage
based on the typical CPU usage prior to that regression. This commit
also adds an RSS SLO, again based on typical usage.
Sample run: https://gitlab.ddbuild.io/DataDog/apm-reliability/dd-trace-py/-/jobs/1118617959
Checklist
Reviewer Checklist