[processor/remotetap] TestConsume test failure #32967

crobert-1 · 2024-05-09T16:55:16Z

Component(s)

processor/remotetap

Describe the issue you're reporting

Failure output:

Running target 'test' in module 'processor/remotetapprocessor' as part of group 'processor-1'
make --no-print-directory -C processor/remotetapprocessor test
/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/.tools/gotestsum --rerun-fails=1 --packages="./..." -- -race -timeout 600s -parallel 4 --tags=""
✓  internal/metadata (1.085s)
✖  . (636ms)
✓  internal/metadata (1.069s)

DONE 31 tests, 6 failures in 35.164s

✖  . (151ms)
✖  . (123ms)
✖  . (139ms)

make[2]: *** [../../Makefile.Common:126: test] Error 1
=== Failed
make[1]: *** [Makefile:176: processor/remotetapprocessor] Error 2
=== FAIL: . TestConsumeMetrics/limit_50 (0.03s)
make[1]: *** Waiting for unfinished jobs....
    processor_test.go:62: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:62
        	Error:      	Not equal: 
        	            	expected: 51
        	            	actual  : 50
        	Test:       	TestConsumeMetrics/limit_50

=== FAIL: . TestConsumeMetrics (0.04s)

=== FAIL: . TestConsumeLogs/limit_50 (0.04s)
    processor_test.go:111: 51
    processor_test.go:112: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:112
        	Error:      	Not equal: 
        	            	expected: 51
        	            	actual  : 50
        	Test:       	TestConsumeLogs/limit_50

=== FAIL: . TestConsumeLogs (0.05s)

=== FAIL: . TestConsumeTraces/limit_50 (0.07s)
    processor_test.go:160: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:160
        	Error:      	Not equal: 
        	            	expected: 53
        	            	actual  : 50
        	Test:       	TestConsumeTraces/limit_50

=== FAIL: . TestConsumeTraces (0.08s)

=== FAIL: . TestConsumeMetrics/limit_50 (re-run 1) (0.03s)
    processor_test.go:62: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:62
        	Error:      	Not equal: 
        	            	expected: 51
        	            	actual  : 50
        	Test:       	TestConsumeMetrics/limit_50

=== FAIL: . TestConsumeMetrics (re-run 1) (0.03s)

=== FAIL: . TestConsumeLogs/limit_50 (re-run 1) (0.04s)
    processor_test.go:111: 51
    processor_test.go:112: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:112
        	Error:      	Not equal: 
        	            	expected: 51
        	            	actual  : 50
        	Test:       	TestConsumeLogs/limit_50

=== FAIL: . TestConsumeLogs (re-run 1) (0.04s)

=== FAIL: . TestConsumeTraces/limit_50 (re-run 1) (0.05s)
    processor_test.go:160: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:160
        	Error:      	Not equal: 
        	            	expected: 52
        	            	actual  : 50
        	Test:       	TestConsumeTraces/limit_50

=== FAIL: . TestConsumeTraces (re-run 1) (0.05s)

DONE 2 runs, 37 tests, 12 failures in 47.463s
✓  . (58.398s)

DONE 64 tests in 212.584s
make[1]: Leaving directory '/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib'
make: *** [Makefile:1[24](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9020470488/job/24785698419#step:7:25): gotest] Error 2
Error: Process completed with exit code 2.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-05-09T16:55:32Z

Pinging code owners:

processor/remotetap: @atoulme

See Adding Labels via Comments if you do not have permissions to add labels yourself.

crobert-1 · 2024-05-09T16:55:45Z

Note that the failure was on arm, not sure if it's related or not.

crobert-1 · 2024-05-09T16:56:53Z

+2 freq:
https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9019486388/job/24782474775#step:7:106
https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9019411342/job/24782226693#step:7:109

crobert-1 · 2024-05-09T17:13:03Z

Looks like this is a result of #32481

crobert-1 · 2024-05-09T19:38:19Z

Looks like for the bigger limit there may be some timing issue going on with arm. The limit is based on events per second, so theoretically if the tests run longer than a second the test is invalid as it would be a new interval for the given limit. I'm seeing all tests take about 0.1 of a second though.

The limiter we're using may be checking the second itself from time.Now for interval enforcement though, which would lead to potential flakiness going across second boundaries, even though the test is running for less that a second. When I added a time.Sleep call to the test to make it take longer on darwin/amd64 I'm getting the same error. One thing I don't understand though is why it would be barely over the limit every time, instead of much higher if it's splitting the second boundaries.

Either way, I believe this is just a flaky test at this point, the limit is getting enforced "close" to its set value.

crobert-1 · 2024-05-09T19:41:30Z

Increasing the limit can make this functionality much more obvious:
Limit: 1,000 -> received 1,029
Limit: 100,000 -> received 200,000 (the total number of metrics being sent. The test took 4.5 seconds, so this was well within the limit for each second.)

asreehari-splunk · 2024-05-09T21:21:43Z

Seeing these failures here https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9022795143/job/24793402106?pr=32948

I added more information on the issue, but it seems like the limiter being used is especially flaky with timing in tests, so if too many metrics/traces/logs are being sent, the limit won't get fully hit as expected, meaning data will count as two separate intervals in the test (thus not being limited properly). This is flakiness in testing, not the rate limiter's fault, from what I can tell. Also, in `assert.Equal`, the expected value should be first, and the actual value should be second. I've swapped these as the original output of the test failure was causing me confusion. Resolves #32967

crobert-1 added the needs triage New item requiring triage label May 9, 2024

github-actions bot added the processor/remotetap label May 9, 2024

crobert-1 added the arm64 Issues related to arm64 architecture label May 9, 2024

crobert-1 removed the needs triage New item requiring triage label May 9, 2024

crobert-1 mentioned this issue May 9, 2024

[chore][processor/remotetap] Fix flaky tests #32970

Merged

asreehari-splunk mentioned this issue May 9, 2024

[CI] fix timeout on arm-unittest-matrix #32948

Merged

codeboten closed this as completed in #32970 May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[processor/remotetap] TestConsume test failure #32967

[processor/remotetap] TestConsume test failure #32967

crobert-1 commented May 9, 2024

github-actions bot commented May 9, 2024

crobert-1 commented May 9, 2024

crobert-1 commented May 9, 2024

crobert-1 commented May 9, 2024

crobert-1 commented May 9, 2024

crobert-1 commented May 9, 2024

asreehari-splunk commented May 9, 2024