-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes race condition with toSpan and flush #1306
Fixes race condition with toSpan and flush #1306
Conversation
Ping @anuraaga
…On Tue, 7 Sep 2021, 23:43 Andy Lintner, ***@***.***> wrote:
Fixes #1295 <#1295>
Previous to this fix, a call to Tracer::toSpan concurrent with a call
to flush the span from pendingSpans could result in an assertion error.
We now only fetch the pendingSpan once for a single call to toSpan.
------------------------------
You can view, comment on, or merge this pull request online at:
#1306
Commit Summary
- Fixes race condition with toSpan and flush
File Changes
- *M* brave/src/main/java/brave/Tracer.java
<https://github.com/openzipkin/brave/pull/1306/files#diff-8f5726dae04f25d1999a6091f638b8faf320502f01bc54d81e83124e917af9c3>
(20)
- *M* brave/src/test/java/brave/TracerTest.java
<https://github.com/openzipkin/brave/pull/1306/files#diff-849a3d4ea020f209a97f36cbc19dd915e6e9cdd77c29d82249a3d68c5b3e0651>
(22)
Patch Links:
- https://github.com/openzipkin/brave/pull/1306.patch
- https://github.com/openzipkin/brave/pull/1306.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1306>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAXOYASXL6JNT6VEXYSBZO3UA2BO7ANCNFSM5DTH2BKA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@anuraaga - I believe the test failure from the automation is just an inconsistent test. The same passes locally. Let me know if you think otherwise and I can dig further. |
@jeqo @jorgheymans any of you can give a check on this? |
@andylintner test is green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall seems like a sensible change, avoiding to get parent span twice to avoid race condition.
If we can make sure this is the only execution path that is affected by this issue, it looks good to me.
I wonder if instead of having an overloaded version of _toSpan
, we could switch into the new interface TraceContext context, PendingSpan
and move PendingSpan pendingSpan = pendingSpans.getOrCreate(parent, context, false);
into the other usages of _toSpan
. wdyt?
PendingSpan getPendingSpan(TraceContext context) { | ||
return pendingSpans.get(context); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this one line method be replaced by a single call to pendingSpans.get(context)
TraceContext pendingContext = swapForPendingContext(context); | ||
if (pendingContext != null) return _toSpan(parent, pendingContext); | ||
// Re-use a pending span if present: This ensures reference consistency on Span.context() | ||
PendingSpan pendingSpan = getPendingSpan(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PendingSpan pendingSpan = getPendingSpan(context); | |
PendingSpan pendingSpan = pendingSpans.get(context); |
// Re-use a pending span if present: This ensures reference consistency on Span.context() | ||
PendingSpan pendingSpan = getPendingSpan(context); | ||
if (pendingSpan != null) { | ||
if (isNoop(context)) return new NoopSpan(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved into the _toSpan
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could, but it would mean either a redundant check with the call from the overloaded _toSpan, or eliminating the shortcut taken there that avoids unnecessary pendingSpan lookup.
Hi @jeqo, @jcchavezs |
Ping @llinder
søn. 29. mai 2022, 16:39 skrev Nikita Konev ***@***.***>:
… Hi @jeqo <https://github.com/jeqo>
I ran into this issue.
Can this PR be merged ?
—
Reply to this email directly, view it on GitHub
<#1306 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAXOYAR5RHHNTM5MVX7Z66LVMN6SLANCNFSM5DTH2BKA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Any updates ? |
@@ -192,12 +192,6 @@ public final Span joinSpan(TraceContext context) { | |||
} | |||
} | |||
|
|||
/** Returns an equivalent context if exists in the pending map */ | |||
TraceContext swapForPendingContext(TraceContext context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this was removed.
I need to get some time to try the test and show it fails. That will happen over the weekend. |
@@ -1,5 +1,5 @@ | |||
/* | |||
* Copyright 2013-2020 The OpenZipkin Authors | |||
* Copyright 2013-2021 The OpenZipkin Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2022
Running this PR to see how the test fails without the fix #1336 |
Hi @andylintner unfortunately the unit test does not fail when the fix is not in place, could you please rework it so we show the failure? see #1336 |
Hello, any news on this issue? We are facing the same issue. Any solution or workaround is more than welcome. Thanks. |
We need a test showing the issue otherwise we cannot merge the fix. |
Fixes openzipkin#1295 Previous to this fix, a call to Tracer::toSpan concurrent with a call to flush the span from pendingSpans could result in an assertion error. We now only fetch the pendingSpan once for a single call to toSpan.
2730fed
to
42a8705
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with these changes. Thanks for the help.
according to #1336, this test passes before or after this change. I'll merge the latter, as the test needs to be improved in order to show what's wrong. When ready please raise a new PR! |
Fixes #1295
Previous to this fix, a call to Tracer::toSpan concurrent with a call
to flush the span from pendingSpans could result in an assertion error.
We now only fetch the pendingSpan once for a single call to toSpan.