Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/remotetap] TestConsume test failure #32967

Closed
crobert-1 opened this issue May 9, 2024 · 7 comments · Fixed by #32970
Closed

[processor/remotetap] TestConsume test failure #32967

crobert-1 opened this issue May 9, 2024 · 7 comments · Fixed by #32970
Labels
arm64 Issues related to arm64 architecture processor/remotetap

Comments

@crobert-1
Copy link
Member

Component(s)

processor/remotetap

Describe the issue you're reporting

Failing CI/CD link

Failure output:

Running target 'test' in module 'processor/remotetapprocessor' as part of group 'processor-1'
make --no-print-directory -C processor/remotetapprocessor test
/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/.tools/gotestsum --rerun-fails=1 --packages="./..." -- -race -timeout 600s -parallel 4 --tags=""
✓  internal/metadata (1.085s)
✖  . (636ms)
✓  internal/metadata (1.069s)

DONE 31 tests, 6 failures in 35.164s

✖  . (151ms)
✖  . (123ms)
✖  . (139ms)

make[2]: *** [../../Makefile.Common:126: test] Error 1
=== Failed
make[1]: *** [Makefile:176: processor/remotetapprocessor] Error 2
=== FAIL: . TestConsumeMetrics/limit_50 (0.03s)
make[1]: *** Waiting for unfinished jobs....
    processor_test.go:62: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:62
        	Error:      	Not equal: 
        	            	expected: 51
        	            	actual  : 50
        	Test:       	TestConsumeMetrics/limit_50

=== FAIL: . TestConsumeMetrics (0.04s)

=== FAIL: . TestConsumeLogs/limit_50 (0.04s)
    processor_test.go:111: 51
    processor_test.go:112: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:112
        	Error:      	Not equal: 
        	            	expected: 51
        	            	actual  : 50
        	Test:       	TestConsumeLogs/limit_50

=== FAIL: . TestConsumeLogs (0.05s)

=== FAIL: . TestConsumeTraces/limit_50 (0.07s)
    processor_test.go:160: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:160
        	Error:      	Not equal: 
        	            	expected: 53
        	            	actual  : 50
        	Test:       	TestConsumeTraces/limit_50

=== FAIL: . TestConsumeTraces (0.08s)

=== FAIL: . TestConsumeMetrics/limit_50 (re-run 1) (0.03s)
    processor_test.go:62: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:62
        	Error:      	Not equal: 
        	            	expected: 51
        	            	actual  : 50
        	Test:       	TestConsumeMetrics/limit_50

=== FAIL: . TestConsumeMetrics (re-run 1) (0.03s)

=== FAIL: . TestConsumeLogs/limit_50 (re-run 1) (0.04s)
    processor_test.go:111: 51
    processor_test.go:112: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:112
        	Error:      	Not equal: 
        	            	expected: 51
        	            	actual  : 50
        	Test:       	TestConsumeLogs/limit_50

=== FAIL: . TestConsumeLogs (re-run 1) (0.04s)

=== FAIL: . TestConsumeTraces/limit_50 (re-run 1) (0.05s)
    processor_test.go:160: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:160
        	Error:      	Not equal: 
        	            	expected: 52
        	            	actual  : 50
        	Test:       	TestConsumeTraces/limit_50

=== FAIL: . TestConsumeTraces (re-run 1) (0.05s)

DONE 2 runs, 37 tests, 12 failures in 47.463s
✓  . (58.398s)

DONE 64 tests in 212.584s
make[1]: Leaving directory '/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib'
make: *** [Makefile:1[24](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9020470488/job/24785698419#step:7:25): gotest] Error 2
Error: Process completed with exit code 2.
@crobert-1 crobert-1 added the needs triage New item requiring triage label May 9, 2024
Copy link
Contributor

github-actions bot commented May 9, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member Author

Note that the failure was on arm, not sure if it's related or not.

@crobert-1
Copy link
Member Author

Looks like this is a result of #32481

@crobert-1 crobert-1 added the arm64 Issues related to arm64 architecture label May 9, 2024
@crobert-1
Copy link
Member Author

Looks like for the bigger limit there may be some timing issue going on with arm. The limit is based on events per second, so theoretically if the tests run longer than a second the test is invalid as it would be a new interval for the given limit. I'm seeing all tests take about 0.1 of a second though.

The limiter we're using may be checking the second itself from time.Now for interval enforcement though, which would lead to potential flakiness going across second boundaries, even though the test is running for less that a second. When I added a time.Sleep call to the test to make it take longer on darwin/amd64 I'm getting the same error. One thing I don't understand though is why it would be barely over the limit every time, instead of much higher if it's splitting the second boundaries.

Either way, I believe this is just a flaky test at this point, the limit is getting enforced "close" to its set value.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label May 9, 2024
@crobert-1
Copy link
Member Author

Increasing the limit can make this functionality much more obvious:
Limit: 1,000 -> received 1,029
Limit: 100,000 -> received 200,000 (the total number of metrics being sent. The test took 4.5 seconds, so this was well within the limit for each second.)

@asreehari-splunk
Copy link
Contributor

codeboten pushed a commit that referenced this issue May 22, 2024
I added more information on the issue, but it seems like the limiter
being used is especially flaky with timing in tests, so if too many
metrics/traces/logs are being sent, the limit won't get fully hit as
expected, meaning data will count as two separate intervals in the test
(thus not being limited properly). This is flakiness in testing, not the
rate limiter's fault, from what I can tell.

Also, in `assert.Equal`, the expected value should be first, and the
actual value should be second. I've swapped these as the original output
of the test failure was causing me confusion.

Resolves #32967
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arm64 Issues related to arm64 architecture processor/remotetap
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants