-
Notifications
You must be signed in to change notification settings - Fork 288
Adding locks where context is accessed #528
Conversation
665f930
to
255ce9d
Compare
Codecov Report
@@ Coverage Diff @@
## master #528 +/- ##
==========================================
+ Coverage 89.28% 89.30% +0.02%
==========================================
Files 61 61
Lines 3919 3918 -1
==========================================
Hits 3499 3499
+ Misses 294 292 -2
- Partials 126 127 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joe-elliott you might be interested in reviewing this.
@@ -427,10 +427,10 @@ func (s *Span) serviceName() string { | |||
|
|||
func (s *Span) applySamplingDecision(decision SamplingDecision, lock bool) { | |||
if !decision.Retryable { | |||
s.context.samplingState.setFinal() | |||
s.SpanContext().samplingState.setFinal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The presence of the lock
parameter in this function indicates that it might be called while the lock on the span is already acquired by the current goroutine. What would happen in that case if we call SpanContext() that will try to acquire a read-lock? I remember in the past that in Go the RWMutex is not re-entrable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah rwmutex is not re-entrable, but so far this is not called in places where the caller already obtains a lock, the method is thread-safe in itself, so the caller shouldn't need to obtain locks, added documentation for that. lock param is set to false only in the start span method where too no locks are obtained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think it's dangerous to ignore the lock parameter even if right now there doesn't happen to be a codepath that deadlocks. Perhaps access s.context
directly here and move the if lock
code to the top?
Alternatively the way it's been changed we're always locking and we've already taken the performance hit. We could just remove the lock param. Depending on benchmarks this may be an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is true, I started making changes that way to apply lock while calling all these methods, but this method also calls sampler and observer callbacks like sampler.OnSetOperationName()
, etc., which won't be in the same package and have to call with locks, and there is high risk of these calling locks than callers to these internal methods.
We can call locks around these call backs, the code is going to lock and unlock twice at-least in each method, one before and after sampler callbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (s *Span) setTagInternal(key string, value interface{}, lock bool) opentracing.Span {
s.observer.OnSetTag(key, value)
if lock {
s.RLock()
}
if key == string(ext.SamplingPriority) && !setSamplingPriority(s, value) {
return s
}
if !s.isSamplingFinalized() {
if lock {
s.RUnlock()
}
decision := s.tracer.sampler.OnSetTag(s, key, value)
if lock {
s.Lock()
defer s.Unlock()
}
s.applySamplingDecision(decision, lock)
} else {
if lock {
s.Lock()
defer s.Unlock()
}
}
if s.isWriteable() {
s.appendTagNoLocking(key, value)
}
return s
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how some three methods in span.go would look, if this looks good we can go with this.
As a rule not a good idea to hold a lock while calling external methods (like sampler callbacks), but internal methods we can document if it is to be called with or without locks. And I believe the lock parameter was just so locks are not called when we need not bother about concurrency like when the span is started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some benchmarks can tell which approach is better, I will add some micro benchmarks over the weekend
@@ -363,7 +363,6 @@ func TestSpan_References(t *testing.T) { | |||
} | |||
|
|||
func TestSpanContextRaces(t *testing.T) { | |||
t.Skip("Skipped: test will panic with -race, see https://github.com/jaegertracing/jaeger-client-go/issues/526") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to add a few more mutators to this test, namely those that affect samplingState and flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah added a few, and also called span.Finish in the end, and found a few more data race conditions in the reportSpan() method and modified it
This commit addresses data race issues where span.context is accessed without locks in methods which can be called concurrently with method setBaggageItem which modifies context Per issue jaegertracing#526 Signed-off-by: Keerthana Selvakumar <keerukeerthana8@gmail.com>
@@ -427,10 +427,10 @@ func (s *Span) serviceName() string { | |||
|
|||
func (s *Span) applySamplingDecision(decision SamplingDecision, lock bool) { | |||
if !decision.Retryable { | |||
s.context.samplingState.setFinal() | |||
s.SpanContext().samplingState.setFinal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think it's dangerous to ignore the lock parameter even if right now there doesn't happen to be a codepath that deadlocks. Perhaps access s.context
directly here and move the if lock
code to the top?
Alternatively the way it's been changed we're always locking and we've already taken the performance hit. We could just remove the lock param. Depending on benchmarks this may be an option.
@@ -120,8 +123,8 @@ func (s *Span) setTagInternal(key string, value interface{}, lock bool) opentrac | |||
|
|||
// SpanContext returns span context | |||
func (s *Span) SpanContext() SpanContext { | |||
s.Lock() | |||
defer s.Unlock() | |||
s.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a number of other places that hold .Lock()
when only .RLock()
is needed. Seeing .StartTime()
, .Duration()
, .Tags()
for instance. Probably not in the scope of this fix though.
}) | ||
go accessor(func() { | ||
span.SpanContext().samplingState.setFlag(flagDebug) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe reset
still races, but unsure if this is worth protecting. This is only called when the syncpool is enabled and generally the lifecycle code seems dangerous. It would be trivial for the calling application to hold a *Span
after .Finish()
and do terrible things with it so I think we have to trust the calling code.
Added a comment to the main thread with some final thoughts.
@@ -439,7 +439,7 @@ func (t *Tracer) emitNewSpanMetrics(sp *Span, newTrace bool) { | |||
func (t *Tracer) reportSpan(sp *Span) { | |||
if !sp.isSamplingFinalized() { | |||
t.metrics.SpansFinishedDelayedSampling.Inc(1) | |||
} else if sp.context.IsSampled() { | |||
} else if sp.SpanContext().IsSampled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store the value of SpanContext().IsSampled()
at the top of the method to only call SpanContext()
once?
@@ -445,12 +451,12 @@ func (s *Span) applySamplingDecision(decision SamplingDecision, lock bool) { | |||
|
|||
// Span can be written to if it is sampled or the sampling decision has not been finalized. | |||
func (s *Span) isWriteable() bool { | |||
state := s.context.samplingState | |||
state := s.SpanContext().samplingState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isWriteable
and isSamplingFinalized
are called beneath setTagInternal
which receives a lock
param that is ignored here.
So I've spent some time thinking about this and the following makes sense to me.
@yurishkuro does this make sense? |
This bothers me a little. The first two are actually derived from the settings stored in the SpanContext, so passing them the SpanContext seems backwards, could we just call them on the SpanContext directly? setSamplingPriority - I'd instead pass the I think doing it this way reduces the coupling. |
I see no issue with having a SpanContext as a receiver instead of taking one as a parameter. It's kind of a stylistic choice and I was considering suggesting it as well. 👍
Agree. |
@keer25 are you still working on this issue? if not, i would like to take it up. |
This commit addresses data race issues where span.context is accessed without locks in methods which can be called concurrently with method setBaggageItem which modifies context
Resolves issue #526
Signed-off-by: Keerthana Selvakumar keerukeerthana8@gmail.com
Which problem is this PR solving?
Short description of the changes