Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Flaky Test: TestSpanProcessorWithOnDroppedSpanOption #4450

Closed
albertteoh opened this issue May 9, 2023 · 1 comment · Fixed by #4489
Closed

[Bug]: Flaky Test: TestSpanProcessorWithOnDroppedSpanOption #4450

albertteoh opened this issue May 9, 2023 · 1 comment · Fixed by #4489
Labels
bug good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement

Comments

@albertteoh
Copy link
Contributor

albertteoh commented May 9, 2023

What happened?

PR (note changes are unrelated to the test failure): #4446

Test failure link

https://github.com/jaegertracing/jaeger/actions/runs/4919763496/jobs/8787773908?pr=4446#step:7:590

Retrying the test job resulted in all tests passing.

Steps to reproduce

Could not reproduce.

Failing test was introduced from this PR: #4387

Expected behavior

Tests to pass.

Relevant log output

=== RUN   TestSpanProcessorWithOnDroppedSpanOption
    span_processor_test.go:682: 
        	Error Trace:	/home/runner/work/jaeger/jaeger/cmd/collector/app/span_processor_test.go:682
        	Error:      	Not equal: 
        	            	expected: []string{"op2", "op3"}
        	            	actual  : []string{"op3"}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,3 +1,2 @@
        	            	-([]string) (len=2) {
        	            	- (string) (len=3) "op2",
        	            	+([]string) (len=1) {
        	            	  (string) (len=3) "op3"
        	Test:       	TestSpanProcessorWithOnDroppedSpanOption
--- FAIL: TestSpanProcessorWithOnDroppedSpanOption (0.00s)
FAIL
	github.com/jaegertracing/jaeger/cmd/collector/app	coverage: 98.5% of statements

Screenshot

No response

Additional context

No response

Jaeger backend version

No response

SDK

No response

Pipeline

No response

Stogage backend

No response

Operating system

No response

Deployment model

No response

Deployment configs

No response

@albertteoh albertteoh added the bug label May 9, 2023
@yurishkuro yurishkuro added good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement labels May 9, 2023
@ChenX1993
Copy link
Contributor

ChenX1993 commented May 10, 2023

I think it is a concurrent problem.
I was trying to use the blocking writer to make a span drop situation, but the second test span is not 100% guaranteed to be dropped.

func TestSpanProcessorWithOnDroppedSpanOption(t *testing.T) {
var droppedOperations []string
customOnDroppedSpan := func(span *model.Span) {
droppedOperations = append(droppedOperations, span.OperationName)
}
w := &blockingWriter{}
p := NewSpanProcessor(w,
nil,
Options.NumWorkers(1),
Options.QueueSize(1),
Options.OnDroppedSpan(customOnDroppedSpan),
).(*spanProcessor)
defer p.Close()
// block the writer so that the first span is read from the queue and blocks the processor, and followings are dropped.
w.Lock()
defer w.Unlock()
_, err := p.ProcessSpans([]*model.Span{
{OperationName: "op1"},
{OperationName: "op2"},
{OperationName: "op3"},
}, processor.SpansOptions{SpanFormat: processor.JaegerSpanFormat})
assert.NoError(t, err)
assert.Equal(t, []string{"op2", "op3"}, droppedOperations)
}

Below is the code for the consume part of the bouned_queue used in collector processor.
If the second test span is enqueued slow enough after L83 of processing the first span, then the second span will be enqueued successfully which causes the test failure as the test expect the second span to be dropped. Same for the third span.

func (q *BoundedQueue) StartConsumersWithFactory(num int, factory func() Consumer) {
q.workers = num
q.factory = factory
var startWG sync.WaitGroup
for i := 0; i < q.workers; i++ {
q.stopWG.Add(1)
startWG.Add(1)
go func() {
startWG.Done()
defer q.stopWG.Done()
consumer := q.factory()
queue := *q.items
for {
select {
case item, ok := <-queue:
if ok {
q.size.Sub(1)
consumer.Consume(item)
} else {
// channel closed, finish worker
return
}
case <-q.stopCh:
// the whole queue is closing, finish worker
return
}
}
}()
}
startWG.Wait()
}

If I change the test like below, then it will fail almost every time:

...
p.ProcessSpans([]*model.Span{{OperationName: "op1"}}, processor.SpansOptions{SpanFormat: processor.JaegerSpanFormat})
time.Sleep(1 * time.Second)
p.ProcessSpans([]*model.Span{{OperationName: "op2"}}, processor.SpansOptions{SpanFormat: processor.JaegerSpanFormat})
p.ProcessSpans([]*model.Span{{OperationName: "op3"}}, processor.SpansOptions{SpanFormat: processor.JaegerSpanFormat})
...

And the error is the same:

Not equal: 
        expected: []string{"op2", "op3"}
        actual  : []string{"op3"}

I will think about how to write the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement
Projects
None yet
3 participants