Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-8667] Heap Profiling - Part 3 - Snapshot system #3328

Merged
merged 19 commits into from
Dec 21, 2023

Conversation

AlexJF
Copy link
Contributor

@AlexJF AlexJF commented Dec 13, 2023

What does this PR do?

This PR follows #3287 and reduces the conflict of serialization and sampling which was forcing us to potentially drop some allocation samples on the floor.

How does it work?

Initially the assumption was that sampling and serialization operated on the same structures and thus needed to be protected by a mutex. Due to us never wanting to delay sampling code, we had noted the need for a queueing system to not be forced to drop data on the floor. This was the only realistic implementation back when we were relying on free tracepoints which could invalidate object_records at any point in time.

@ivoanjo noted that:

  • With the current implementation where staleness checking does not rely on free tracepoints but is done synchronously and under GVL before iteration.
  • st_copy operations seem to be extremely efficient, requiring at most 2 mallocs and 2 memcpy.
  • We can guarantee a non-concurrent and strict ordering of heap_recorder_prepare_iteration, heap_recorder_for_each_live_object and heap_recorder_finish_iteration (since they all run from the same serialization thread and we have only one such thread).

Given the above, we can trivially do a copy/snapshot of object_records during heap_recorder_prepare_iteration and have the actual iteration operate on that snapshot without ever conflicting with any concurrent sampling that might happen in-between (object_records_snapshot is read-only after construction, mutations will only be recorded in object_records during any sampling that happens concurrently with serialization).

Motivation:
Never wanting to block the allocation tracepoint and not wanting to drop relevant samples on the floor.

Additional Notes:

How to test the change?
It's quite hard to manually or deterministically trigger the conditions whereupon serialization could conflict with sampling and which justified this entire endeavour. Hopefully the fact that all existing specs plus the new "long serialization simulation" specs can provide enough assurance?

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

@AlexJF AlexJF changed the title [PROF-8667] Heap Profiling - Part 3 - Frees and Queue [PROF-8667] Heap Profiling - Part 3 - Queue Dec 13, 2023
@github-actions github-actions bot added the profiling Involves Datadog profiling label Dec 13, 2023
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part3-queue branch from 0eed56c to a86fccc Compare December 13, 2023 13:02
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from a8357de to 12c22c5 Compare December 13, 2023 13:28
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part3-queue branch from a86fccc to 15278a3 Compare December 13, 2023 13:29
[PROF-8667] Some comments and location fixes.

[PROF-8667] Fix typo and enable 'records live heap objects' test.

[PROF-8667] Heap Profiling - Part 2 - Live Tracking

[PROF-8667] Remove one remaining usage of sum.

[PROF-8667] Address comments
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from 12c22c5 to 051c803 Compare December 13, 2023 15:47
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part3-queue branch 4 times, most recently from bac32b4 to 8577e49 Compare December 14, 2023 12:46
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from 171a265 to 774c96a Compare December 14, 2023 12:48
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part3-queue branch from 8577e49 to fb50bbd Compare December 14, 2023 12:49
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from e9c26ce to ed57a26 Compare December 14, 2023 18:52
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part3-queue branch from fb50bbd to 6674f5c Compare December 15, 2023 12:11
@AlexJF AlexJF marked this pull request as ready for review December 15, 2023 12:19
@AlexJF AlexJF requested review from a team as code owners December 15, 2023 12:19
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part3-queue branch from 6674f5c to ef2dfbf Compare December 15, 2023 12:22
@codecov-commenter
Copy link

codecov-commenter commented Dec 15, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (bb96f8f) 98.23% compared to head (d58e2f1) 98.23%.
Report is 5 commits behind head on master.

Files Patch % Lines
spec/datadog/profiling/stack_recorder_spec.rb 92.85% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3328   +/-   ##
=======================================
  Coverage   98.23%   98.23%           
=======================================
  Files        1253     1253           
  Lines       72998    73020   +22     
  Branches     3427     3429    +2     
=======================================
+ Hits        71712    71734   +22     
  Misses       1286     1286           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here sir, take my 👍 . I've left a few notes on how I don't quite think this is the way to go, but leave the decision in your hands :)

ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
spec/datadog/profiling/stack_recorder_spec.rb Outdated Show resolved Hide resolved
@AlexJF AlexJF changed the title [PROF-8667] Heap Profiling - Part 3 - Queue [PROF-8667] Heap Profiling - Part 3 - Snapshot system Dec 18, 2023
Base automatically changed from alexjf/prof-8667-heap-profiling-part2-allocations-with-ids to master December 19, 2023 12:18
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part3-queue branch from 0e72b55 to 01cb735 Compare December 19, 2023 13:02
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part3-queue branch 2 times, most recently from 83c429b to b8bb16e Compare December 20, 2023 17:04
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part3-queue branch from b8bb16e to a598746 Compare December 20, 2023 17:58
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part3-queue branch from 47dd4f4 to d73978d Compare December 20, 2023 19:15
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part3-queue branch from 643ee92 to d58e2f1 Compare December 20, 2023 20:17
@AlexJF AlexJF merged commit d4b2b3e into master Dec 21, 2023
218 checks passed
@AlexJF AlexJF deleted the alexjf/prof-8667-heap-profiling-part3-queue branch December 21, 2023 11:42
@github-actions github-actions bot added this to the 1.19.0 milestone Dec 21, 2023
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some last-minute notes from a last pass I did yesterday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants