Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Travis build failed in KafkaZipkinTracerTest.whenKafkaIsDown #82

Open
andystanton opened this issue Mar 11, 2020 · 2 comments
Open

Travis build failed in KafkaZipkinTracerTest.whenKafkaIsDown #82

andystanton opened this issue Mar 11, 2020 · 2 comments

Comments

@andystanton
Copy link
Contributor

andystanton commented Mar 11, 2020

Issue Description

The Travis build for #80 failed on the test KafkaZipkinTracerTest.whenKafkaIsDown: https://travis-ci.org/github/openzipkin/zipkin-finagle/builds/659274432.

However, the PR did not change any Kafka code or dependencies and the build passed when it was run a second time, so it's possible this is an unreliable test. The error message is:

[INFO] Running zipkin2.finagle.kafka.KafkaZipkinTracerTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.007 s - in zipkin2.finagle.kafka.KafkaZipkinTracerTest
[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   KafkaZipkinTracerIntegrationTest.whenKafkaIsDown:98 
Expecting:
  <{WrappedArray(span_bytes)=185L, List(messages_dropped, java.lang.IllegalStateException)=1L, WrappedArray(spans)=1L, WrappedArray(spans_dropped)=1L, WrappedArray(message_bytes)=187L, WrappedArray(messages)=1L, List(messages_dropped)=2L, List(messages_dropped, org.apache.kafka.common.errors.TimeoutException)=1L}>
to contain only:
  <[MapEntry[key=Buffer(spans), value=1L],
    MapEntry[key=Buffer(span_bytes), value=185L],
    MapEntry[key=Buffer(spans_dropped), value=1L],
    MapEntry[key=Buffer(messages), value=1L],
    MapEntry[key=Buffer(message_bytes), value=187L],
    MapEntry[key=Buffer(messages_dropped), value=1L],
    MapEntry[key=Buffer(messages_dropped, org.apache.kafka.common.errors.TimeoutException), value=1L]]>
elements not found:
  <[MapEntry[key=Buffer(messages_dropped), value=1L]]>
and elements not expected:
  <[MapEntry[key=List(messages_dropped, java.lang.IllegalStateException), value=1L],
    MapEntry[key=List(messages_dropped), value=2L]]>

The kafka module failed to build before the http module started, so it's unlikely that some state teardown was missed in the new code:

[INFO] ------------------------------------------------------------------------
2653[INFO] Reactor Summary for Zipkin Finagle (Parent) 2.1.6-SNAPSHOT:
2654[INFO] 
2655[INFO] Zipkin Finagle (Parent) ............................ SUCCESS [  4.159 s]
2656[INFO] Zipkin Finagle: Core ............................... SUCCESS [  8.622 s]
2657[INFO] Zipkin Finagle: Kafka .............................. FAILURE [ 32.253 s]
2658[INFO] Zipkin Finagle: Http ............................... SKIPPED
2659[INFO] Zipkin Finagle: Scribe ............................. SKIPPED

Steps to recreate

The steps to recreate the issue are to run the build via Maven as per the Travis configuration:

./mvnw install -DskipTests=true -Dmaven.javadoc.skip=true -B -V -Dlicense.skip=true

However, I ran this on a loop locally and did not see the same issue after tens of runs.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Mar 12, 2020

there's some flake risk in the test...
https://github.com/openzipkin/zipkin-finagle/blob/master/kafka/src/test/java/zipkin2/finagle/kafka/KafkaZipkinTracerIntegrationTest.java#L96

but the double-drop count is interesting.. I think failsafe isn't setup correct. Probably we should rename things *IntegrationTest to IT* and fix the parent pom to include

Right now, the failsafe config seems to be there, but unused! Mind giving a go?

        <executions>
          <execution>
            <id>integration-test</id>
            <phase>verify</phase>
            <goals>
              <goal>integration-test</goal>
              <goal>verify</goal>
            </goals>
          </execution>
        </executions>

@andystanton
Copy link
Contributor Author

So two parts to your request - one is sorting out Failsafe and the other is getting to the root cause of the flakey test. I'm happy to take a look at both.

Do you think this blocks the release of #80 in the meantime, given the functionality is unrelated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants