Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverage collection at the end from .acv files is taking significant time #175

Closed
tilak-nanavati-opshub opened this issue Nov 25, 2022 · 28 comments

Comments

@tilak-nanavati-opshub
Copy link

Hi Steve,

Altcover is working great. capturing the coverage of all the instrumented DLLs, but recently we have observed that the .acv files are growing and reaching ~300 MB, (and there are multiple such files), from these while collecting the coverage, it is taking a significant amount of time.

Observation: A file (~300 MB in size) takes more than 48 mins.

Is there any way we can reduce this time?

It would be a great help.

Thanks!

@SteveGilham
Copy link
Owner

Exactly what is written to the .acv visits file(s) will depend on how you are driving AltCover. In most use cases, each visit is written eagerly, with "classic" mode and the dotnet test integration being exceptions.

The output to file is buffered and compressed to reduce the file I/O bottleneck impact upon the system being tested, with the size on disk per visit record being dependent on the compressibility of the visit stream. Multiple such files are a result of concurrency handling measures; for example separate AppContexts (or AppDomains) each have their own output stream, again to reduce the impact on the system being recorded. On collection, the visit records are combined into a Dictionary of Dictionarys, a slightly worse than linear behaviour in the number of records, so from then on the process depends on the number of points visited rather than the number of individual visits.

For a simple visit process (e.g. one pass through most code), the compression makes each {module,line,count} visit around 12-13 bytes; but in instances where there is a tight loop with a great deal of repetition in the output stream, compression ratios down to tens of visits per output byte have been observed. A 300Mb file would thus represent at least ~25 million distinct visits, but potentially several billion.

If you are using the command line, --defer option (or running in "classic" mode) from the command line will accumulate the visits in memory while the tests are running, reducing the overhead of repeat visits (turning n instances of {module,line,1} to a single {module,line,n}), but at the cost of leaving the output to the process exit handler. In plain dotnet there is no timeout limit for this handler; in the .Net framework there is a grace period of only 2 seconds by default and the VSTest tool leaves only 100ms (typically insufficient). The AltCover dotnet test integration works around this by injecting a data collector object that executes before the process exits, so implicitly does --defer.

Alternatively, the --single option can be used to skip collecting information about repeat visits; the effect on the output volume will be similar to that of --defer (turning n instances of {module,line,1} to a single {module,line,1}).

Another approach would be to identify any hot spots, and arrange tests of the hot spots to be separate from the tests of the rest of the system (run with the hot spot types/methods involved being excluded from coverage), combining them all in post-processing.

Related issues - #57 and #111

@SteveGilham
Copy link
Owner

Finally got around to doing some performance testing, and with a program dominated by a tight loop -

let n = 1000
for i in 1..n do
  for j in 1..n do
    use s = new System.IO.MemoryStream(1024)
    use b = new System.IO.BinaryWriter(s)
    for k in 1..n do
      System.DateTime.UtcNow.Ticks
      |> b.Write

generated a > 300MB .acv file, which I processed on old, low-powered hardware and got

... C:\Users\steve\Documents\GitHub\altcover\coverage.xml.0.acv (328,069,077b)
36,435,929 visits recorded in 00:01:12.7423790 (500,890 visits/sec)
A total of 36,435,929 visits recorded
Coverage statistics flushing took 0.17 seconds

so it doesn't in fact look like the bottleneck is in the unpacking of the coverage data - indeed with a smaller test set, the throughput was lower

... C:\Users\steve\Documents\GitHub\altcover\coverage.xml.0.acv (36,796,001b)
4,070,503 visits recorded in 00:00:08.6479199 (470,692 visits/sec)

as the fraction of time spent unpacking dropped.

Bearing all that in mind, the likelihood is that the issue reported here is a result of the size of the code being covered - and the consequent scaling of the in-memory representation of the report as it's being filled in - of necessity, a random-access thing. So, rather than just the size of the .acv files, of more interest, are scales like the size of the raw coverage report, and how many assemblies are involved across how many kLoC.

I've not done any comparisons at scale yet to compare the performance of the JSON representation vs the XML one, so cannot advise whether that might be another thing to try; but in any case the points made earlier -- reducing the number of data points collected and partitioning the testing across units before aggregating will still speed the process up.

@tilak-nanavati-opshub
Copy link
Author

tilak-nanavati-opshub commented Nov 15, 2023

Hi Steve,

Thank you for sharing the detailed insights about the time performance metrics with a quick test-set.

We appreciate the observations provided in the comment.

We also believe that a huge code repository can create a difference, i.e. introduce significant time latency while collecting the code coverage. We might need to look into how we can minimize the data points collected (need to investigate further into this).

May be we can start partitioning the testing modules and club the coverage collected at the end, all in parallel, that seems a good way ahead.

We will post the update how it goes!

Thanks again!

@tilak-nanavati-opshub
Copy link
Author

Just to confirm, there's no AltCover specific setting to speed up the coverage collection process, right? As per our understanding from the documentation.

@bheemvennapureddy
Copy link

Quite a bit of a difference from opencover from coverlet vs altcover coverage reports taking a huge time

@bheemvennapureddy
Copy link

Screenshot 2024-04-05 at 3 30 36 PM Screenshot 2024-04-05 at 3 15 22 PM

Opencover takes 2m 27 secs with Altcover it takes 14 mins

@bheemvennapureddy
Copy link

@SteveGilham Any insights on what could be done here ?

@bheemvennapureddy
Copy link

I see the instrumentation takes the quite long time compared to open cover with coverlet

@SteveGilham
Copy link
Owner

Up front, I don't think I can offer any quick wins with this, except that if you're getting many (hundreds or more) visits per location, there is some speed-up to be had from using the /p:AltCoverSingle="true" option.

At the moment I don't have any clear indications of hot-spots; and any ground-up rewrite would be at best speculative.

Can you indicate some metrics for the size of the work you're doing - how many kloc, how many total visits, that sort of thing.

@SteveGilham

This comment was marked as outdated.

@bheemvennapureddy
Copy link

@tilak-nanavati-opshub removed

@SteveGilham
Copy link
Owner

As well as the discussion here (and in #225) regarding the use of --single and --defer, it is worth noting in the comparisons with coverlet that that tool takes a very corner-cutting approach to the OpenCover coverage format, both at instrumentation time and at report generation.

In particular, this includes

  • using a random GUID rather than actually computing file hashes for module identification at instrumentation time
  • not filling in the summary records and similar derived quantities at report generation time (requiring random-access writes to the report)

@bheemvennapureddy
Copy link

@SteveGilham we have some big repos where the instrumentation takes quite a long time. Would you like more logs or examples to give you more insights?

@SteveGilham
Copy link
Owner

AltCover deliberately doesn't dump much logging information (so it can't leak any end user secrets), so there's not a lot can directly be done.

However, being prompted by your request, and having profiled part of the code for a newly introduced problem, I have now profiled AltCover while self-instrumenting, to find that almost 60% of the time is spent dealing with branch-points - an unexpected result, but one which gives me a definite place to start working.

@SteveGilham
Copy link
Owner

Release 8.8.173 removes the hot-spot that was taking most of the instrumentation time.

@SteveGilham
Copy link
Owner

Related to the main issue - having profiled the AltCover self-tests, there are no simple hot-spots like there were in the branch cover instrumentation, however

  • default collection - eager reporting of every visit : deserialisation of the coverage data is ~85% of the processing time
  • Single visit or deferred reporting : deserialisation is in each case ~3% of the (now much reduced) processing time

While there are use cases for both eager reporting (e.g. in cases where end-of-process handling is insufficient, such as using VSTest.exe without the AltCover data collector, and possibly in some cases of multiple AppDomain processing) and for recording every visit (though profiling is better for determining hot-spots), those are probably less common.

I intend to take the net9.0 release be an occasion to make a major 9.0 release of my own, which will make --defer and --single the default behaviours and the equivalent breaking API changes, as well as any other house clearing that comes to mind.

@bheemvennapureddy
Copy link

bheemvennapureddy commented Jul 31, 2024

@SteveGilham here are some logs why does coverage statistics take that long time ?

 71,157 visits recorded in 00:00:00.5910351 (120,394 visits/sec)
  A total of 71,157 visits recorded
  Coverage statistics flushing took 31.67 seconds
  Visited Classes 2404 of 16548 (14.53)
  Visited Methods 10983 of 109718 (10.01)
  Visited Points 62551 of 598681 (10.45)
  Visited Branches 9290 of 249334 (3.73)
  Maximum CRAP score 47306

42,384 visits recorded in 00:00:00.3965540 (106,881 visits/sec)
  A total of 42,384 visits recorded
  Coverage statistics flushing took 44.48 seconds
  Visited Classes 1650 of 16293 (10.13)
  Visited Methods 6588 of 108934 (6.05)
  Visited Points 37442 of 591783 (6.33)
  Visited Branches 5231 of 248488 (2.11)
  Maximum CRAP score 56882

@SteveGilham
Copy link
Owner

Quick answer - it spends a lot of time deserialising the coverage data.

Long answer - it depends on how you configure coverage collection. The default configuration opts for robustness of data collection over speed. In particular, that means eagerly writing every visit individually, rather than waiting for a ProcessExit handler to write them - VsTest.exe having a distressing habit of overriding the indefinite time allowed for dotnet processes and assuming any slow termination to be a hung test and killing it after a few milliseconds.

If you know that the ProcesExit handler will be allowed to run to completion, you can use the --defer option to bundle the output together at the very end - for example, from my most recent GitHub build, a somewhat larger dataset takes far less time

Screenshot 2024-07-31 185611

This used command line

Command line : '"C:\Program Files\dotnet\dotnet.EXE" test --verbosity normal D:\a\altcover\altcover\Samples\Sample24\Sample24.csproj --configuration Debug --framework net8.0 --no-build

this taking advantage of injecting a data collector into the VsTest process which is allowed to run to completion to output coverage data.

If --defer doesn't fit your use-case, then if you do not really need every visit, just a binary "was it executed or not", the --single option can also drastically reduce the amount of data being handled, and thus the deserialisation time.

@SteveGilham
Copy link
Owner

As a data point for the effect that --single has, a larger test set (AltCover's self-test)

Screenshot 2024-07-31 193238

Beyond that - the total size of your system under test is about 30x the largest that I have - the AltCover self-test - and in that case, even a linear scaling of the XML processing time with number of elements would lead to expected processing times in the 10s of seconds, as observed.

The analysis of that part of the process showed that it was dominated by the needful operations - loading & saving the report, and in between, writing the updated attribute values - so a scaling by the number of records would be expected.

@SteveGilham
Copy link
Owner

On reflection - there is one useful set of logging that could be provided above and beyond the raw times of coverage processing; and that is a performance profiler run of the collection phase of the test process, or at least a flame-graph showing the portions from the AltCover.Engine assembly and below.

Amongst other things, that would answer the question about where the time is being spent.

@bheemvennapureddy
Copy link

C:\hostedtoolcache\windows\dotnet\dotnet.exe test D:\a\1\s\tests\BLT.UnitTests\BLT.UnitTests.csproj --logger trx --results-directory D:\a\_temp -c Release --no-restore --no-build /p:AltCover=true /p:AltCoverSingle=true /p:AltCoverCobertura=coverage.xml --settings .runsettings 

We already use AltCoverSingle

@bheemvennapureddy
Copy link

How can i use defer i don't see an option here https://stevegilham.com/altcover/%60dotnet-test%60-integration

@SteveGilham
Copy link
Owner

Thank you for at last providing some specifics - the actual command line - rather than talking generalities and getting generalities back.

  1. the dotnet test integration uses a data collector to automatically have the effect of --defer, which is why it's not provided as a command line option there

  2. the system instrumented is huge - ~600k sequence points, a quarter of a million branch points - this means a huge XML report, and consequently longer processing time. Simply calling Load and Save may well be dominating the whole operation.

  3. however, in the data quoted above, the actual coverage percentage is low - about 10% of the instrumented code at best - suggesting that there is a lot of dead-weight being processed. Combined with the fact that there are no filters defined in the command line, this is consistent with including all the system and other third-party libraries that your application links, but from which only a small amount of the total functionality is drawn - adding to the report processing time to no useful end.

Appropriate use of /p:AltCoverAssemblyFilter=..., and where possible - i.e. when testing on the same machine as your code was built on - of /p:AltCoverLocalSource=true will skip this clutter, and concentrate the effort on just the code you can affect, and which these tests are actually meant to exercise. Than also means, when testing just one module of a larger system, excluding other modules that are not relevant to the tests being performed.

@bheemvennapureddy
Copy link

@SteveGilham if i try to use lcov instead of the cobertura reports the system slows down much more

@SteveGilham
Copy link
Owner

Duly noted, and will add that translation to the things to profile. In general, though, use of a native format (NCover 1.x, OpenCover or, now, the coverlet-based JSON) is to be preferred over the lossy formats like cobertura and lcov, not least because those involve a data translation step from from a native format.

@bheemvennapureddy
Copy link

Makes sense. /p:AltCoverLocalSource=true and 8.8.173 improved a lot thanks for those updates.

@SteveGilham
Copy link
Owner

For the LCov case, profiling shows that the bottleneck is in the brute-force process of finding for each source file, which methods does it contain. I can see ways to ease that, even if it does make the code more complex and less declarative than it currently stands.

@SteveGilham
Copy link
Owner

As this issue has become hopelessly muddled, and the new 9.0 release includes the fast options by default, amongst other various clearing of bottlenecks, I'll close it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants