Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: clean up profilers for discarded transactions #3154

Merged

Conversation

armcknight
Copy link
Member

📜 Description

💡 Motivation and Context

For the feedback at #3135 (comment)

💚 How did you test it?

📝 Checklist

You have to check all boxes before merging:

  • I reviewed the submitted code.
  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

🔮 Next steps

@armcknight armcknight changed the base branch from main to armcknight/ref/rename-SentryTracerConcurrency July 14, 2023 23:41
Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this approach is future-proof. I fear that it will be fragile. Someone can easily forget to call discardProfilerForTracer when changing the SentryTracer. Having one place to call discardProfilerForTracer would be better.

@philipphofmann
Copy link
Member

philipphofmann commented Jul 17, 2023

@armcknight, I'm a bit confused about why we need to keep references to tracers here

/**
* a mapping of profilers to the tracers that started them that are still in-flight and will need to
* query them for their profiling data when they finish. this helps resolve the incongruity between
* the different timeout durations between tracers (500s) and profilers (30s), where a transaction
* may start a profiler that then times out, and then a new transaction starts a new profiler, and
* we must keep the aborted one around until its associated transaction finishes.
*/
static NSMutableDictionary</* SentryProfiler.profileId */ NSString *,
NSMutableSet<SentryTracer *> *> *_gProfilersToTracers;

We only use the traceId in SentryProfiledTracerConcurrency, and we rely on the count to resetProfilingTimestamps for the framesTracker and to stop the profiler.

@@ -425,10 +425,16 @@ - (void)finishInternal
{
[self cancelDeadlineTimer];
if (self.isFinished) {
#if SENTRY_TARGET_PROFILING_SUPPORTED
discardProfilerForTracer(self);
Copy link
Member

@philipphofmann philipphofmann Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h: Instead of having to ensure calling discardProfilerForTracer in all the correct places, it would be great to have one single place in the tracer. Maybe here

[self.delegate tracerDidFinish:self];
if (self.finishCallback) {
self.finishCallback(self);
// The callback will only be executed once. No need to keep the reference and we avoid
// potential retain cycles.
self.finishCallback = nil;
}

or here

[self captureTransactionWithProfile:transaction];

would be sufficient, if we also do the following: To ensure that we don't end up with infinite memory growth, we could store timestamps SentryProfiledTracerConcurrency.m when a traceID or a profiler was added. We only have to keep them around until transactions time out, which is 450s or something similar. Whenever we add a new tracer to profile mapping in SentryProfiledTracerConcurrency we could check if there are old profiles in the dictionary, and if there are, we remove them. This algorithm would be our insurance and we also don't need weak references. WDYT, @armcknight?

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

Copy link
Member Author

@armcknight armcknight Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I also just noticed that the 500s limit is only a duration limit enforced after computation. Automatic tracers actually also have a 30 second timeout. But this is only for automatic tracers, not manually created ones. If there is no time limit for these, then we can't simply track timestamps.

(Note: I hid a couple other comments that won't matter if we can't do this.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be fine, yes. At the point where there is any memory growth due to unended transactions, that's a logic error by the SDK consumer, not the SDK itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just adding the call to discardProfilerForTracer in SentryTracer.dealloc will be sufficient. We can't call that before/after this

[self.delegate tracerDidFinish:self];

because then the profiler will be gone by the time we try to get its data later in finishInternal at the call to captureTransactionWithProfile (which eventually goes on to remove the tracking for the tracer, that's the happy path).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added several new tests, please check them out!

Copy link
Member

@philipphofmann philipphofmann Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the point where there is any memory growth due to unended transactions, that's a logic error by the SDK consumer, not the SDK itself.

With the previous solution, it wouldn't be a logic error by our users. Doing the following and never calling finish for whatever reason is a valid use case.

let transaction1 = SentrySDK.startTransaction(name: "name", operation: "op")
let transaction2 = SentrySDK.startTransaction(name: "name", operation: "op")

With the previous solution, the SDK would start a profile for the transaction and keep the profile in memory forever. ARC would deallocate transaction1 and transaction2 at some point.

Anyway, with the current solution, it shouldn't be a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess by logic error I just meant, if they start tons of endless transactions and keep refs to them, but also expect our memory to not grow accordingly.

Base automatically changed from armcknight/ref/rename-SentryTracerConcurrency to main July 17, 2023 22:59
@armcknight
Copy link
Member Author

armcknight commented Jul 17, 2023

@armcknight, I'm a bit confused about why we need to keep references to tracers here

/**
* a mapping of profilers to the tracers that started them that are still in-flight and will need to
* query them for their profiling data when they finish. this helps resolve the incongruity between
* the different timeout durations between tracers (500s) and profilers (30s), where a transaction
* may start a profiler that then times out, and then a new transaction starts a new profiler, and
* we must keep the aborted one around until its associated transaction finishes.
*/
static NSMutableDictionary</* SentryProfiler.profileId */ NSString *,
NSMutableSet<SentryTracer *> *> *_gProfilersToTracers;

We only use the traceId in SentryProfiledTracerConcurrency, and we rely on the count to resetProfilingTimestamps for the framesTracker and to stop the profiler.

Correct, it's essentially relying on count. So we can just keep a counter per profiler to know when to stop it. Can't remember now why I was keeping the tracers around 🤷🏻 See 1467d20

@github-actions
Copy link

github-actions bot commented Jul 18, 2023

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1207.50 ms 1237.92 ms 30.42 ms
Size 22.84 KiB 403.24 KiB 380.39 KiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
dbc67d2 1239.49 ms 1248.88 ms 9.39 ms
b385962 1195.85 ms 1221.63 ms 25.78 ms
b6ba04e 1230.48 ms 1253.20 ms 22.72 ms
06548c0 1262.80 ms 1275.00 ms 12.20 ms
407ff99 1225.49 ms 1232.88 ms 7.39 ms
e2abb0d 1235.08 ms 1257.00 ms 21.92 ms
7bc3c0d 1261.16 ms 1278.38 ms 17.22 ms
8f397a7 1224.66 ms 1236.48 ms 11.82 ms
2405ba5 1248.37 ms 1259.30 ms 10.93 ms
257c2a9 1239.52 ms 1251.08 ms 11.56 ms

App size

Revision Plain With Sentry Diff
dbc67d2 20.76 KiB 427.74 KiB 406.98 KiB
b385962 20.76 KiB 399.69 KiB 378.93 KiB
b6ba04e 20.76 KiB 414.44 KiB 393.68 KiB
06548c0 20.76 KiB 427.36 KiB 406.59 KiB
407ff99 20.76 KiB 427.87 KiB 407.10 KiB
e2abb0d 20.76 KiB 434.72 KiB 413.96 KiB
7bc3c0d 20.76 KiB 427.36 KiB 406.59 KiB
8f397a7 20.76 KiB 420.55 KiB 399.79 KiB
2405ba5 20.76 KiB 435.23 KiB 414.47 KiB
257c2a9 20.76 KiB 401.36 KiB 380.60 KiB

Previous results on branch: armcknight/fix/profiling-transaction-bookkeeping-cleanup

Startup times

Revision Plain With Sentry Diff
d71e189 1240.52 ms 1254.35 ms 13.83 ms
3c9bbdd 1229.72 ms 1249.90 ms 20.18 ms

App size

Revision Plain With Sentry Diff
d71e189 22.84 KiB 403.87 KiB 381.02 KiB
3c9bbdd 22.84 KiB 403.37 KiB 380.53 KiB

Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are getting close.

@@ -425,10 +425,16 @@ - (void)finishInternal
{
[self cancelDeadlineTimer];
if (self.isFinished) {
#if SENTRY_TARGET_PROFILING_SUPPORTED
discardProfilerForTracer(self);
Copy link
Member

@philipphofmann philipphofmann Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual transactions don't have a duration limit. Users could keep it alive for hours. So my suggestion won't work.

I thought about it and I think we only have one problem to solve with the current solution: When a user starts a transaction, never calls finish, and doesn't keep a reference to the transactions, SentryProfiledTracerConcurrency will keep a reference to the profile and the traceID forever. The dealloc of the SentryTracer could call discardProfilerForTracer. dealloc would also be our insurance if we forget to call discardProfilerForTracer somewhere.

Furthermore, I think we only need to call discardProfilerForTracer below or above the following line

[self.delegate tracerDidFinish:self];

We use this callback also here

- (void)tracerDidFinish:(SentryTracer *)tracer
{
@synchronized(self.spans) {
[self.spans removeObjectForKey:tracer.spanId];
}
}

We don't need to consider transactions that never finish in the SentryPerformanceTracker as it only keeps track of automatic transactions and they have a timeout.

Doing those two suggested changes would do the job, I believe.

Please add some tests after doing the changes here.

@codecov
Copy link

codecov bot commented Jul 19, 2023

Codecov Report

Merging #3154 (0529a52) into main (5b14b06) will increase coverage by 0.025%.
The diff coverage is 98.750%.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##              main     #3154       +/-   ##
=============================================
+ Coverage   89.172%   89.197%   +0.025%     
=============================================
  Files          502       502               
  Lines        53916     54062      +146     
  Branches     19344     19405       +61     
=============================================
+ Hits         48078     48222      +144     
  Misses        4988      4988               
- Partials       850       852        +2     
Impacted Files Coverage Δ
...entry/Profiling/SentryProfiledTracerConcurrency.mm 94.736% <94.444%> (-1.874%) ⬇️
Sources/Sentry/SentryProfiler.mm 80.065% <100.000%> (+0.398%) ⬆️
Sources/Sentry/SentryTracer.m 96.526% <100.000%> (-0.159%) ⬇️
...SentryProfilerTests/SentryProfilerSwiftTests.swift 97.530% <100.000%> (+0.420%) ⬆️

... and 14 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b14b06...0529a52. Read the comment docs.

Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think we just miss a couple of more tests to ensure we covered all edge cases. After adding these we can merge. Approving to unblock the PR.

CI is still unhappy and we miss a changelog entry.

* @warning Must be called from a synchronized context.
*/
void
_unsafe_cleanUpProfiler(SentryProfiler *profiler, NSString *tracerKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: The prefix unsafe raises a few questions for me at the first glance. I thought this function is somewhat dangerous 😄 . Actually it's just not thread safe. What about renaming it to cleanUpProfiler_non_thread_safe?

Suggested change
_unsafe_cleanUpProfiler(SentryProfiler *profiler, NSString *tracerKey)
_cleanUpProfiler_non_thread_safe(SentryProfiler *profiler, NSString *tracerKey)

Sources/Sentry/SentryProfiler.mm Show resolved Hide resolved
Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CHANGELOG.md Outdated Show resolved Hide resolved
@armcknight armcknight merged commit ed68562 into main Jul 21, 2023
@armcknight armcknight deleted the armcknight/fix/profiling-transaction-bookkeeping-cleanup branch July 21, 2023 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants