Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverage flakes #362

Closed
oschaaf opened this issue Jun 15, 2020 · 3 comments · Fixed by #481
Closed

Coverage flakes #362

oschaaf opened this issue Jun 15, 2020 · 3 comments · Fixed by #481

Comments

@oschaaf
Copy link
Member

oschaaf commented Jun 15, 2020

Sometimes coverage dips far below the threshold (90.x% instead of 98.x%). When this happens it is always the python integration test runs not being considered when it comes to computing
the coverage. Some more information on this can be found here.

I'm filing this issue because it just happened on a CI run following a merge to master:
https://app.circleci.com/pipelines/github/envoyproxy/nighthawk/1640/workflows/6823a1ea-bb7d-4bce-a42c-ae6c6b38de5d/jobs/12923/steps

TLDR; we probably need to wait for clang-10 to get used in the docker image we use to test, and we source that image from Envoy.
If that doesn't resolve it, more looking into this is warranted.

@oschaaf
Copy link
Member Author

oschaaf commented Jun 15, 2020

Cross pasting the tail of the CI output in case it gets wiped because of storage policies 30 days ahead:

Overall coverage rate:
  lines......: 90.7% (3134 of 3455 lines)
  functions..: 86.9% (344 of 396 functions)
+ COVERAGE_VALUE=90.7%
+ COVERAGE_VALUE=90.7
+ [[ -z '' ]]
+ '[' true == true ']'
+ COVERAGE_THRESHOLD=98.2
++ echo '90.7<98.2'
++ bc
+ COVERAGE_FAILED=1
+ test 1 -eq 1
+ echo Code coverage 90.7 is lower than limit of 98.2
Code coverage 90.7 is lower than limit of 98.2
+ exit 1

oschaaf added a commit to oschaaf/nighthawk that referenced this issue Jun 19, 2020
Some mundande changes related to header construction.
Switches us to clang-10, which hopefully addresses envoyproxy#362

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
oschaaf added a commit to oschaaf/nighthawk that referenced this issue Jun 19, 2020
Has some mundane changes related to header construction.
Switches us to clang-10, which hopefully addresses envoyproxy#362

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
mum4k pushed a commit that referenced this issue Jun 22, 2020
Notable changes in this Update:

- Changes related to header construction
- CI moves from clang-9 to clang-10
- Changes related to satisfy the new clang-tidy version:
  - Fix superfluous initializers in the rate limiter code.
  - Suppress modernize-avoid-bind to get sequencer_test.cc to pass checks.
  (clang-tidy's auto generated suggested fix fails to build)
- Move to use typed configs in `AddFilter()` in the server filter tests. 
  This avoids asserts being triggered by the integration test framework
  as deprecated fields for which support was dropped get generated
  otherwise.
- Drop the coverage threshold by 0.1%

Hopefully addresses #362 by producing more reliable and accurate
coverage reports.

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@oschaaf
Copy link
Member Author

oschaaf commented Jun 22, 2020

Bad news, clang-10 didn't fix this, as #367 just hit this again:
https://app.circleci.com/pipelines/github/envoyproxy/nighthawk/1714/workflows/12b06f69-d7a3-443d-9899-1b65de5859fb/jobs/13535/steps

The python integration tests failed to yield coverage.dat

 /build/tmp/_bazel_bazel/f85b6fb5740e6e8c7efea142eec4b6e8/execroot/nighthawk/bazel-out/k8-fastbuild/testlogs/test/output_transform_main_test/coverage.dat
//test:platform_util_test                                                PASSED in 19.7s
  /build/tmp/_bazel_bazel/f85b6fb5740e6e8c7efea142eec4b6e8/execroot/nighthawk/bazel-out/k8-fastbuild/testlogs/test/platform_util_test/coverage.dat
//test:process_test                                                      PASSED in 26.6s
  /build/tmp/_bazel_bazel/f85b6fb5740e6e8c7efea142eec4b6e8/execroot/nighthawk/bazel-out/k8-fastbuild/testlogs/test/process_test/coverage.dat
//test:python_test                                                       PASSED in 497.7s
//test:rate_limiter_test                                                 PASSED in 20.5s
  /build/tmp/_bazel_bazel/f85b6fb5740e6e8c7efea142eec4b6e8/execroot/nighthawk/bazel-out/k8-fastbuild/testlogs/test/rate_limiter_test/coverage.dat

This then drops the CI job way below the coverage threshold:

Code coverage 91.1 is lower than limit of 98.4

@oschaaf
Copy link
Member Author

oschaaf commented Aug 24, 2020

For posterity, as we're about to exclude the python integration tests for coverage to resolve this,
two things that I think are causing the flakes we observe here:

  • Since switching to the methodology that relies on bazel coverage ..., the integration tests
    occasionally fail to produce any coverage -- even while passing, and with that fail to pass
    the coverage threshold we set. The mechanics of this are unknown, this used to work fine,
    the coverage files will be completely missing / not generated.
  • The integration tests themselves are not deterministic, as they cannot use simulated time and
    need reasonable responsiveness of the host system to pass. This isn't always the case,
    as with bazel coverage building and testing may occur in parallel, which is hard to split up
    and the coverage generation process may incur lots of I/O. These things combined cause
    thresholds that otherwise will be reasonably met (as they have a very low bar to pass) to
    fail.

mum4k pushed a commit that referenced this issue Aug 28, 2020
Following an offline discussion about coverage flakes in CI, and how to best
improve there we decided to exclude the integration tests altogether from strict coverage
treshold expectations. As such, we lower the unit testing coverage threshold to 92.1%
here. We still run the integration tests in a new CI task, with the intent to make that not
required to pass for merging, so informational only.

Fixes #362 

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant