Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed another list of unstable unit tests in travis #4915

Merged

Conversation

pierresouchay
Copy link
Contributor

Fixed failing tests in https://travis-ci.org/hashicorp/consul/jobs/451357061 triggered by #4904

@freddygv Will improve a bit the robustness of tests

@banks
Copy link
Member

banks commented Nov 7, 2018

Thanks @pierresouchay we're preparing master for 1.4.0 release so won't merge immediately.

@pierresouchay pierresouchay force-pushed the fix_unstable_unit_tests_in_travis branch from 8a8189c to bde304e Compare November 7, 2018 15:15
@freddygv
Copy link
Contributor

@pierresouchay I have a couple questions:

  1. Why add a wait to the HTTPAPI tests? given that this was the error: Get http://127.0.0.1:49134/v1/agent/self: net/http: request canceled (Client.Timeout exceeded while awaiting headers

  2. Also wondering about the wait in the agent/consul tests (like TestServer_JoinLAN_TLS) that had this error: Failed to start TCP listener on "127.0.0.1" port 47542: listen tcp 127.0.0.1:47542: bind: address already in use

Mostly wondering if these changes would have prevented the failures in the build you linked to.

@pierresouchay
Copy link
Contributor Author

pierresouchay commented Nov 16, 2018

@freddygv

@pierresouchay I have a couple questions:

  1. Why add a wait to the HTTPAPI tests? given that this was the error: Get http://127.0.0.1:49134/v1/agent/self: net/http: request canceled (Client.Timeout exceeded while awaiting headers
  2. Also wondering about the wait in the agent/consul tests (like TestServer_JoinLAN_TLS) that had this error: Failed to start TCP listener on "127.0.0.1" port 47542: listen tcp 127.0.0.1:47542: bind: address already in use

Mostly wondering if these changes would have prevented the failures in the build you linked to.

I dunno, maybe I am wrong, I added those fixes after failure https://travis-ci.org/hashicorp/consul/jobs/451357061 -> usually, using testrpc.WaitForTestAgent fixes most of unstable tests in travis, so I used the same old trick as I am used to...

(This PR is probably a combinaison of other failures I saw in other Travis builds, I don't remember)

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the HTTP options one that is now flaky is due to the timeouts we added (to stop them just hanging forever like they used to). The best fix for those is to add ones that seems to fial often to the "slow" list to get a longer timeout.

All of them take milliseconds when not run in parallel but the cartesian product means over 500 test cases so not running in parallel takes too long for Travis. I kinda want to fix it to be more efficient and.or figure out the ones that have timing issues when run in parallel with others (due to contention etc) but not had time.

This PR seems safe and likely fixes some more cases though so I'll merge for now.

@banks banks merged commit c5ae9ca into hashicorp:master Nov 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants