Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes race condition with toSpan and flush #1306

Closed

Conversation

andylintner
Copy link
Contributor

Fixes #1295
Previous to this fix, a call to Tracer::toSpan concurrent with a call
to flush the span from pendingSpans could result in an assertion error.

We now only fetch the pendingSpan once for a single call to toSpan.

@jcchavezs
Copy link
Contributor

jcchavezs commented Sep 8, 2021 via email

@andylintner
Copy link
Contributor Author

@anuraaga - I believe the test failure from the automation is just an inconsistent test. The same passes locally. Let me know if you think otherwise and I can dig further.

@jcchavezs
Copy link
Contributor

@jeqo @jorgheymans any of you can give a check on this?

@jcchavezs
Copy link
Contributor

@andylintner test is green.

Copy link
Member

@jeqo jeqo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall seems like a sensible change, avoiding to get parent span twice to avoid race condition.

If we can make sure this is the only execution path that is affected by this issue, it looks good to me.

I wonder if instead of having an overloaded version of _toSpan, we could switch into the new interface TraceContext context, PendingSpan and move PendingSpan pendingSpan = pendingSpans.getOrCreate(parent, context, false); into the other usages of _toSpan. wdyt?

Comment on lines 196 to 198
PendingSpan getPendingSpan(TraceContext context) {
return pendingSpans.get(context);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this one line method be replaced by a single call to pendingSpans.get(context)

TraceContext pendingContext = swapForPendingContext(context);
if (pendingContext != null) return _toSpan(parent, pendingContext);
// Re-use a pending span if present: This ensures reference consistency on Span.context()
PendingSpan pendingSpan = getPendingSpan(context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PendingSpan pendingSpan = getPendingSpan(context);
PendingSpan pendingSpan = pendingSpans.get(context);

// Re-use a pending span if present: This ensures reference consistency on Span.context()
PendingSpan pendingSpan = getPendingSpan(context);
if (pendingSpan != null) {
if (isNoop(context)) return new NoopSpan(context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved into the _toSpan method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but it would mean either a redundant check with the call from the overloaded _toSpan, or eliminating the shortcut taken there that avoids unnecessary pendingSpan lookup.

@nkonev
Copy link

nkonev commented May 29, 2022

Hi @jeqo, @jcchavezs
I ran into this issue.
Can this PR be merged ?

@jcchavezs
Copy link
Contributor

jcchavezs commented May 29, 2022 via email

@nkonev
Copy link

nkonev commented Jun 6, 2022

Any updates ?

@@ -192,12 +192,6 @@ public final Span joinSpan(TraceContext context) {
}
}

/** Returns an equivalent context if exists in the pending map */
TraceContext swapForPendingContext(TraceContext context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this was removed.

@jcchavezs
Copy link
Contributor

I need to get some time to try the test and show it fails. That will happen over the weekend.

jcchavezs added a commit that referenced this pull request Jun 30, 2022
jcchavezs added a commit that referenced this pull request Jun 30, 2022
@@ -1,5 +1,5 @@
/*
* Copyright 2013-2020 The OpenZipkin Authors
* Copyright 2013-2021 The OpenZipkin Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2022

@jcchavezs
Copy link
Contributor

Running this PR to see how the test fails without the fix #1336

@jcchavezs
Copy link
Contributor

Hi @andylintner unfortunately the unit test does not fail when the fix is not in place, could you please rework it so we show the failure? see #1336

@frosiere
Copy link
Contributor

Hello, any news on this issue? We are facing the same issue. Any solution or workaround is more than welcome. Thanks.

@jcchavezs
Copy link
Contributor

jcchavezs commented Jul 13, 2023

We need a test showing the issue otherwise we cannot merge the fix.

alintnersfdc and others added 3 commits December 15, 2023 18:53
Fixes openzipkin#1295
Previous to this fix, a call to Tracer::toSpan concurrent with a call
to flush the span from pendingSpans could result in an assertion error.

We now only fetch the pendingSpan once for a single call to toSpan.
Signed-off-by: Adrian Cole <adrian@tetrate.io>
@codefromthecrypt codefromthecrypt force-pushed the concurrent-span-flush-fix branch from 2730fed to 42a8705 Compare December 15, 2023 11:54
Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with these changes. Thanks for the help.

codefromthecrypt pushed a commit that referenced this pull request Dec 15, 2023
@codefromthecrypt
Copy link
Member

according to #1336, this test passes before or after this change. I'll merge the latter, as the test needs to be improved in order to show what's wrong. When ready please raise a new PR!

codefromthecrypt pushed a commit that referenced this pull request Dec 15, 2023
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Co-authored-by: Adrian Cole <adrian@tetrate.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parent can only be null in a local root
7 participants