Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integration-tests: tweaks to stabilize sanitizer runs #357

Merged
merged 8 commits into from
Jun 15, 2020

Conversation

oschaaf
Copy link
Member

@oschaaf oschaaf commented Jun 11, 2020

This PR has changes to deflake integration tests for sanitizer runs.

  • Allow servers more time to start up
  • Slow down some tests to make them less sensitive to timing issues
  • It is possible for some counter values to slightly exceed a configured termination threshold
    for them in edge cases. Update expectations to match that.

Fixes #332

oschaaf added 5 commits June 11, 2020 21:04
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@oschaaf oschaaf force-pushed the ci-sanitizer-run-stabilizations branch from 14954cc to 0d24955 Compare June 11, 2020 21:38
oschaaf added 3 commits June 11, 2020 23:40
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@oschaaf
Copy link
Member Author

oschaaf commented Jun 11, 2020

I filed #356 for tracking the thresholds getting exceeded. If that is fixable, then we could revert some of the expectations I updated here.

@oschaaf oschaaf added P1 waiting-for-review A PR waiting for a review. labels Jun 11, 2020
@oschaaf oschaaf marked this pull request as ready for review June 11, 2020 22:55
@mum4k mum4k merged commit bc72a52 into envoyproxy:master Jun 15, 2020
mum4k pushed a commit that referenced this pull request Jun 19, 2020
Following some digging into stability of our coverage testing, creating this PR for testing CI
with some fixes / enhancements, as well as covering a few more lines in our
CLI tooling exception paths. There seems to be something racy going on where sometimes the coverage originating from our python integration tests is not counted.

Eventually I learned that things might stabilise when we switch the toolchain to clang 10. We follow Envoy's lead on that, so for now stopping further efforts on this.

### Description of the changes in this PR:

- Increase timeouts of ASAN/TSAN runs
- Make `envoy_build_sha.sh`, which apparently broke in one of the Envoy updates
- Cover some of the outer exception handling paths for Nighthawk's CLIs
- Integration test for nighthawk_output_transform
- Add coverage for code lines where it seemed trivial to do so (e.g. NullStatistic)
- Fix `run_nighthawk_bazel_coverage.sh` to work on my machine
- Stricter bash options in some scripts + changes to make that work

### Prerequisites

- [x] Needs #357 to go in first.

---------

Leaving some notes/learnings here:

- there seems to be something racy going on where sometimes the coverage originating from
   our python integration tests is not counted. This is relatively infrequent in CI.
   - On my dev machine this structurally happens. This might be a toolchain/version issue.
   - On my dev machine, using the docker flow, I have difficulties: the docker container exits when running the full test suite, without leaving a trace why. bash traps in the container don't fire, no useful logging from docker. Eventually I was able to get past bazel fetching its deps, and then build just the python integration tests. As long as a single test is involved, things seem to work / the container gets to finish. It's remarkable that CI doesn't have this problem. Coverage structurally looks as expected in this scenario.
- The link step is super memory hungry. Separating that step might be a ci perf opportunity. Coverage is the slowest CI job we have.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 waiting-for-review A PR waiting for a review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tsan integration test flakes
2 participants