code coverage reports are flappy due to non-deterministic tests #1139

moul · 2023-09-16T22:45:59Z

Partly due to the fact that our testing is currently non-deterministic, there are some instances where some code may or may not be executed "randomly". This can be seen by checking out the codecov report for any trivial PR which doesn't touch code (such as action updates):

PR 1194 "apparently" reduces coverage in three tm2 protocol-related files.
while PR 1193 apparently increases it, instead.

(Note: these are not permalinks, so they may have changed if new commits have been added)

We should try to find the cases where code is executed non-deterministically, and either (1) remove the source of nondeterminism in our test or, where appropriate, (2) unit-test the function so that it is always tested regardless of non-determinism.

Original issue description

Recently, we made codecov a requirement to block PRs that reduce coverage, as detailed in this discussion. This decision was implemented in PR #1120. However, I mistakenly configured codecov, an issue later rectified in PR #1137.

In the interim, @thehowl observed a peculiar behavior, highlighted in PR #1122, which bore similarities to the issues stemming from my codecov misconfiguration.

Interestingly, one of my recent PRs, despite not altering the covered tests, led codecov to report a 16% decrease in coverage. This occurred in PR #1138. This discrepancy could be due to unstable values or potentially another underlying issue. It's worth considering if we should merge a new PR to discern the actual results.

There's a significant possibility that the inconsistency is attributed to an incorrect configuration. Our usage of codecov is a bit more intricate since we operate within a monorepo. This setup contains multiple independent unit tests that progressively send their results to codecov. Occasionally, certain PRs will only upload data for a specific subpackage, excluding the entire project.

The core issue might be that while our uploads are accurate, codecov anticipates full coverage results, even for untested packages. If this hypothesis is correct, we could potentially resolve it by adjusting our configuration settings. Alternatively, we might consider a workaround: storing the latest successful coverage from the master branch as an artifact. This stored data could then be forwarded in instances where a test suite is skipped and cannot generate real-time data.

cc @thehowl @ajnavarro @zivkovicmilos

Proposed Fix for #1139. Given that we're only executing specific GitHub actions based on file changes, shared codecov flags might fluctuate between PRs due to variable testing targets. While this PR retains the recommended flag structure for monorepos, it aims to eliminate potential overlaps. However, overlap could still occur with Go versions like 1.19, 1.20, and so on. Personally, I suggest we retain this overlap for now to gauge its potential inconsistencies. If issues arise, appending the Go version as a suffix to the flag could be considered. Admittedly, I'm charting unfamiliar waters with this solution. Insights from those experienced with Codecov would be greatly appreciated. Signed-off-by: Manfred Touron <94029+moul@users.noreply.github.com>

moul · 2023-09-17T15:28:56Z

It appears my attempt at #1140 hasn't gone as planned. If anyone has experience with configuring codecov for monorepos and conditional GitHub actions, your insights would be greatly appreciated.

Split coverages by component. Related to #1139 - [x] Added new tests, or not needed, or not feasible - [x] Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory - [x] Updated the official documentation or not needed - [x] No breaking changes were made, or a `BREAKING CHANGE: xxx` message was included in the description - [x] Added references to related issues and PRs - [x] Provided any useful hints for running manual tests - [x] Added new benchmarks to [generated graphs](https://gnoland.github.io/benchmarks), if any. More info [here](https://github.com/gnolang/gno/blob/master/.benchmarks/README.md). --------- Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>

Fixing this: ![CleanShot 2023-09-18 at 16 29 59](https://github.com/gnolang/gno/assets/94029/f0139de3-f1d6-412d-9ba9-4f897bf4d2f3) Related with #1139 Signed-off-by: Manfred Touron <94029+moul@users.noreply.github.com>

thehowl · 2023-10-05T01:27:50Z

Codecov seems to be mostly stable now, so this issue is fixed at least in its most "severe form".

I'd still like to keep this open, I've modified the OP for what I think are the current issues, as we're still experiencing some small form of non-determinism in codecov reports.

moul · 2023-10-09T18:59:58Z

Related with #1215.

moul added help wanted 🐞 bug Something isn't working labels Sep 16, 2023

This was referenced Sep 16, 2023

chore: disable codecov flags that can overlap #1140

Merged

feat: introduce personal realms, add r/manfred/home #1138

Merged

chore: use -coverpkg for extended code coverage reports #1122

Closed

moul added the 🗺️good first issue🗺️ Ideal for newcomer contributors label Sep 17, 2023

ajnavarro mentioned this issue Sep 18, 2023

chore: Fix codecov #1144

Merged

7 tasks

This was referenced Sep 18, 2023

Incorporating gno test into gnovm's Coverage Metrics #1145

Open

chore: add empty test files for consistent coverage #1147

Merged

chore: fix codecov config (h/t @the-howl) #1137

Merged

moul added this to the 🚀 main.gno.land (required) milestone Sep 18, 2023

thehowl changed the title ~~Codecov is flappy~~ code coverage reports are flappy due to non-deterministic tests Oct 5, 2023

moul mentioned this issue Oct 9, 2023

Provide clarification on the testing strategy and guidelines. #1215

Open

Kouteki added help wanted Want to contribute? We recommend these issues. and removed help wanted labels Oct 2, 2024

wyhaines added the hacktoberfest This might be a good issue for a hacktoberfest participant to handle. label Oct 9, 2024

Kouteki removed this from the 🚀 Mainnet launch milestone Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code coverage reports are flappy due to non-deterministic tests #1139

code coverage reports are flappy due to non-deterministic tests #1139

moul commented Sep 16, 2023 •

edited by thehowl

Loading

moul commented Sep 17, 2023

thehowl commented Oct 5, 2023

moul commented Oct 9, 2023

code coverage reports are flappy due to non-deterministic tests #1139

code coverage reports are flappy due to non-deterministic tests #1139

Comments

moul commented Sep 16, 2023 • edited by thehowl Loading

moul commented Sep 17, 2023

thehowl commented Oct 5, 2023

moul commented Oct 9, 2023

moul commented Sep 16, 2023 •

edited by thehowl

Loading