Coverage collection at the end from .acv files is taking significant time #175

tilak-nanavati-opshub · 2022-11-25T05:35:16Z

Hi Steve,

Altcover is working great. capturing the coverage of all the instrumented DLLs, but recently we have observed that the .acv files are growing and reaching ~300 MB, (and there are multiple such files), from these while collecting the coverage, it is taking a significant amount of time.

Observation: A file (~300 MB in size) takes more than 48 mins.

Is there any way we can reduce this time?

It would be a great help.

Thanks!

SteveGilham · 2022-11-26T19:18:48Z

Exactly what is written to the .acv visits file(s) will depend on how you are driving AltCover. In most use cases, each visit is written eagerly, with "classic" mode and the dotnet test integration being exceptions.

The output to file is buffered and compressed to reduce the file I/O bottleneck impact upon the system being tested, with the size on disk per visit record being dependent on the compressibility of the visit stream. Multiple such files are a result of concurrency handling measures; for example separate AppContexts (or AppDomains) each have their own output stream, again to reduce the impact on the system being recorded. On collection, the visit records are combined into a Dictionary of Dictionarys, a slightly worse than linear behaviour in the number of records, so from then on the process depends on the number of points visited rather than the number of individual visits.

For a simple visit process (e.g. one pass through most code), the compression makes each {module,line,count} visit around 12-13 bytes; but in instances where there is a tight loop with a great deal of repetition in the output stream, compression ratios down to tens of visits per output byte have been observed. A 300Mb file would thus represent at least ~25 million distinct visits, but potentially several billion.

If you are using the command line, --defer option (or running in "classic" mode) from the command line will accumulate the visits in memory while the tests are running, reducing the overhead of repeat visits (turning n instances of {module,line,1} to a single {module,line,n}), but at the cost of leaving the output to the process exit handler. In plain dotnet there is no timeout limit for this handler; in the .Net framework there is a grace period of only 2 seconds by default and the VSTest tool leaves only 100ms (typically insufficient). The AltCover dotnet test integration works around this by injecting a data collector object that executes before the process exits, so implicitly does --defer.

Alternatively, the --single option can be used to skip collecting information about repeat visits; the effect on the output volume will be similar to that of --defer (turning n instances of {module,line,1} to a single {module,line,1}).

Another approach would be to identify any hot spots, and arrange tests of the hot spots to be separate from the tests of the rest of the system (run with the hot spot types/methods involved being excluded from coverage), combining them all in post-processing.

Related issues - #57 and #111

SteveGilham · 2022-12-23T10:37:14Z

Finally got around to doing some performance testing, and with a program dominated by a tight loop -

let n = 1000
for i in 1..n do
  for j in 1..n do
    use s = new System.IO.MemoryStream(1024)
    use b = new System.IO.BinaryWriter(s)
    for k in 1..n do
      System.DateTime.UtcNow.Ticks
      |> b.Write

generated a > 300MB .acv file, which I processed on old, low-powered hardware and got

... C:\Users\steve\Documents\GitHub\altcover\coverage.xml.0.acv (328,069,077b)
36,435,929 visits recorded in 00:01:12.7423790 (500,890 visits/sec)
A total of 36,435,929 visits recorded
Coverage statistics flushing took 0.17 seconds

so it doesn't in fact look like the bottleneck is in the unpacking of the coverage data - indeed with a smaller test set, the throughput was lower

... C:\Users\steve\Documents\GitHub\altcover\coverage.xml.0.acv (36,796,001b)
4,070,503 visits recorded in 00:00:08.6479199 (470,692 visits/sec)

as the fraction of time spent unpacking dropped.

Bearing all that in mind, the likelihood is that the issue reported here is a result of the size of the code being covered - and the consequent scaling of the in-memory representation of the report as it's being filled in - of necessity, a random-access thing. So, rather than just the size of the .acv files, of more interest, are scales like the size of the raw coverage report, and how many assemblies are involved across how many kLoC.

I've not done any comparisons at scale yet to compare the performance of the JSON representation vs the XML one, so cannot advise whether that might be another thing to try; but in any case the points made earlier -- reducing the number of data points collected and partitioning the testing across units before aggregating will still speed the process up.

tilak-nanavati-opshub · 2023-11-15T07:33:28Z

Hi Steve,

Thank you for sharing the detailed insights about the time performance metrics with a quick test-set.

We appreciate the observations provided in the comment.

We also believe that a huge code repository can create a difference, i.e. introduce significant time latency while collecting the code coverage. We might need to look into how we can minimize the data points collected (need to investigate further into this).

May be we can start partitioning the testing modules and club the coverage collected at the end, all in parallel, that seems a good way ahead.

We will post the update how it goes!

Thanks again!

tilak-nanavati-opshub · 2023-11-15T07:39:24Z

Just to confirm, there's no AltCover specific setting to speed up the coverage collection process, right? As per our understanding from the documentation.

bheemvennapureddy · 2024-04-05T19:33:37Z

Quite a bit of a difference from opencover from coverlet vs altcover coverage reports taking a huge time

bheemvennapureddy · 2024-04-05T20:31:16Z

Opencover takes 2m 27 secs with Altcover it takes 14 mins

bheemvennapureddy · 2024-04-05T20:31:57Z

@SteveGilham Any insights on what could be done here ?

bheemvennapureddy · 2024-04-06T03:29:50Z

I see the instrumentation takes the quite long time compared to open cover with coverlet

SteveGilham · 2024-04-07T09:17:18Z

Up front, I don't think I can offer any quick wins with this, except that if you're getting many (hundreds or more) visits per location, there is some speed-up to be had from using the /p:AltCoverSingle="true" option.

At the moment I don't have any clear indications of hot-spots; and any ground-up rewrite would be at best speculative.

Can you indicate some metrics for the size of the work you're doing - how many kloc, how many total visits, that sort of thing.

bheemvennapureddy · 2024-05-09T13:39:23Z

@tilak-nanavati-opshub removed

SteveGilham · 2024-05-28T10:11:19Z

As well as the discussion here (and in #225) regarding the use of --single and --defer, it is worth noting in the comparisons with coverlet that that tool takes a very corner-cutting approach to the OpenCover coverage format, both at instrumentation time and at report generation.

In particular, this includes

using a random GUID rather than actually computing file hashes for module identification at instrumentation time
not filling in the summary records and similar derived quantities at report generation time (requiring random-access writes to the report)

bheemvennapureddy · 2024-07-25T23:24:09Z

@SteveGilham we have some big repos where the instrumentation takes quite a long time. Would you like more logs or examples to give you more insights?

SteveGilham · 2024-07-26T17:42:13Z

AltCover deliberately doesn't dump much logging information (so it can't leak any end user secrets), so there's not a lot can directly be done.

However, being prompted by your request, and having profiled part of the code for a newly introduced problem, I have now profiled AltCover while self-instrumenting, to find that almost 60% of the time is spent dealing with branch-points - an unexpected result, but one which gives me a definite place to start working.

SteveGilham · 2024-07-27T15:19:49Z

Release 8.8.173 removes the hot-spot that was taking most of the instrumentation time.

SteveGilham · 2024-07-28T16:21:23Z

Related to the main issue - having profiled the AltCover self-tests, there are no simple hot-spots like there were in the branch cover instrumentation, however

default collection - eager reporting of every visit : deserialisation of the coverage data is ~85% of the processing time
Single visit or deferred reporting : deserialisation is in each case ~3% of the (now much reduced) processing time

While there are use cases for both eager reporting (e.g. in cases where end-of-process handling is insufficient, such as using VSTest.exe without the AltCover data collector, and possibly in some cases of multiple AppDomain processing) and for recording every visit (though profiling is better for determining hot-spots), those are probably less common.

I intend to take the net9.0 release be an occasion to make a major 9.0 release of my own, which will make --defer and --single the default behaviours and the equivalent breaking API changes, as well as any other house clearing that comes to mind.

bheemvennapureddy · 2024-07-31T17:13:44Z

@SteveGilham here are some logs why does coverage statistics take that long time ?

 71,157 visits recorded in 00:00:00.5910351 (120,394 visits/sec)
  A total of 71,157 visits recorded
  Coverage statistics flushing took 31.67 seconds
  Visited Classes 2404 of 16548 (14.53)
  Visited Methods 10983 of 109718 (10.01)
  Visited Points 62551 of 598681 (10.45)
  Visited Branches 9290 of 249334 (3.73)
  Maximum CRAP score 47306

42,384 visits recorded in 00:00:00.3965540 (106,881 visits/sec)
  A total of 42,384 visits recorded
  Coverage statistics flushing took 44.48 seconds
  Visited Classes 1650 of 16293 (10.13)
  Visited Methods 6588 of 108934 (6.05)
  Visited Points 37442 of 591783 (6.33)
  Visited Branches 5231 of 248488 (2.11)
  Maximum CRAP score 56882

SteveGilham · 2024-07-31T18:25:01Z

Quick answer - it spends a lot of time deserialising the coverage data.

Long answer - it depends on how you configure coverage collection. The default configuration opts for robustness of data collection over speed. In particular, that means eagerly writing every visit individually, rather than waiting for a ProcessExit handler to write them - VsTest.exe having a distressing habit of overriding the indefinite time allowed for dotnet processes and assuming any slow termination to be a hung test and killing it after a few milliseconds.

If you know that the ProcesExit handler will be allowed to run to completion, you can use the --defer option to bundle the output together at the very end - for example, from my most recent GitHub build, a somewhat larger dataset takes far less time

This used command line

Command line : '"C:\Program Files\dotnet\dotnet.EXE" test --verbosity normal D:\a\altcover\altcover\Samples\Sample24\Sample24.csproj --configuration Debug --framework net8.0 --no-build

this taking advantage of injecting a data collector into the VsTest process which is allowed to run to completion to output coverage data.

If --defer doesn't fit your use-case, then if you do not really need every visit, just a binary "was it executed or not", the --single option can also drastically reduce the amount of data being handled, and thus the deserialisation time.

SteveGilham · 2024-07-31T18:46:35Z

As a data point for the effect that --single has, a larger test set (AltCover's self-test)

Beyond that - the total size of your system under test is about 30x the largest that I have - the AltCover self-test - and in that case, even a linear scaling of the XML processing time with number of elements would lead to expected processing times in the 10s of seconds, as observed.

The analysis of that part of the process showed that it was dominated by the needful operations - loading & saving the report, and in between, writing the updated attribute values - so a scaling by the number of records would be expected.

SteveGilham · 2024-07-31T19:10:13Z

On reflection - there is one useful set of logging that could be provided above and beyond the raw times of coverage processing; and that is a performance profiler run of the collection phase of the test process, or at least a flame-graph showing the portions from the AltCover.Engine assembly and below.

Amongst other things, that would answer the question about where the time is being spent.

bheemvennapureddy · 2024-07-31T21:42:36Z

C:\hostedtoolcache\windows\dotnet\dotnet.exe test D:\a\1\s\tests\BLT.UnitTests\BLT.UnitTests.csproj --logger trx --results-directory D:\a\_temp -c Release --no-restore --no-build /p:AltCover=true /p:AltCoverSingle=true /p:AltCoverCobertura=coverage.xml --settings .runsettings

We already use AltCoverSingle

bheemvennapureddy · 2024-07-31T22:59:27Z

How can i use defer i don't see an option here https://stevegilham.com/altcover/%60dotnet-test%60-integration

SteveGilham · 2024-08-01T07:42:35Z

Thank you for at last providing some specifics - the actual command line - rather than talking generalities and getting generalities back.

the dotnet test integration uses a data collector to automatically have the effect of --defer, which is why it's not provided as a command line option there
the system instrumented is huge - ~600k sequence points, a quarter of a million branch points - this means a huge XML report, and consequently longer processing time. Simply calling Load and Save may well be dominating the whole operation.
however, in the data quoted above, the actual coverage percentage is low - about 10% of the instrumented code at best - suggesting that there is a lot of dead-weight being processed. Combined with the fact that there are no filters defined in the command line, this is consistent with including all the system and other third-party libraries that your application links, but from which only a small amount of the total functionality is drawn - adding to the report processing time to no useful end.

Appropriate use of /p:AltCoverAssemblyFilter=..., and where possible - i.e. when testing on the same machine as your code was built on - of /p:AltCoverLocalSource=true will skip this clutter, and concentrate the effort on just the code you can affect, and which these tests are actually meant to exercise. Than also means, when testing just one module of a larger system, excluding other modules that are not relevant to the tests being performed.

bheemvennapureddy · 2024-08-05T15:25:00Z

@SteveGilham if i try to use lcov instead of the cobertura reports the system slows down much more

SteveGilham · 2024-08-05T17:42:17Z

Duly noted, and will add that translation to the things to profile. In general, though, use of a native format (NCover 1.x, OpenCover or, now, the coverlet-based JSON) is to be preferred over the lossy formats like cobertura and lcov, not least because those involve a data translation step from from a native format.

bheemvennapureddy · 2024-08-06T15:35:33Z

Makes sense. /p:AltCoverLocalSource=true and 8.8.173 improved a lot thanks for those updates.

SteveGilham · 2024-08-11T11:02:24Z

For the LCov case, profiling shows that the bottleneck is in the brute-force process of finding for each source file, which methods does it contain. I can see ways to ease that, even if it does make the code more complex and less declarative than it currently stands.

SteveGilham · 2024-11-14T09:37:22Z

As this issue has become hopelessly muddled, and the new 9.0 release includes the fast options by default, amongst other various clearing of bottlenecks, I'll close it now.

This comment was marked as outdated.

Sign in to view

SteveGilham mentioned this issue May 24, 2024

Coverage collection requires instrumented code in same location until collection is not completed causing service downtime #225

Closed

SteveGilham closed this as completed Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coverage collection at the end from .acv files is taking significant time #175

Coverage collection at the end from .acv files is taking significant time #175

tilak-nanavati-opshub commented Nov 25, 2022

SteveGilham commented Nov 26, 2022

SteveGilham commented Dec 23, 2022

tilak-nanavati-opshub commented Nov 15, 2023 •

edited

Loading

tilak-nanavati-opshub commented Nov 15, 2023

bheemvennapureddy commented Apr 5, 2024

bheemvennapureddy commented Apr 5, 2024

bheemvennapureddy commented Apr 5, 2024

bheemvennapureddy commented Apr 6, 2024

SteveGilham commented Apr 7, 2024

This comment was marked as outdated.

bheemvennapureddy commented May 9, 2024

SteveGilham commented May 28, 2024

bheemvennapureddy commented Jul 25, 2024

SteveGilham commented Jul 26, 2024

SteveGilham commented Jul 27, 2024

SteveGilham commented Jul 28, 2024

bheemvennapureddy commented Jul 31, 2024 •

edited

Loading

SteveGilham commented Jul 31, 2024

SteveGilham commented Jul 31, 2024

SteveGilham commented Jul 31, 2024

bheemvennapureddy commented Jul 31, 2024

bheemvennapureddy commented Jul 31, 2024

SteveGilham commented Aug 1, 2024

bheemvennapureddy commented Aug 5, 2024

SteveGilham commented Aug 5, 2024

bheemvennapureddy commented Aug 6, 2024

SteveGilham commented Aug 11, 2024

SteveGilham commented Nov 14, 2024

Coverage collection at the end from .acv files is taking significant time #175

Coverage collection at the end from .acv files is taking significant time #175

Comments

tilak-nanavati-opshub commented Nov 25, 2022

SteveGilham commented Nov 26, 2022

SteveGilham commented Dec 23, 2022

tilak-nanavati-opshub commented Nov 15, 2023 • edited Loading

tilak-nanavati-opshub commented Nov 15, 2023

bheemvennapureddy commented Apr 5, 2024

bheemvennapureddy commented Apr 5, 2024

bheemvennapureddy commented Apr 5, 2024

bheemvennapureddy commented Apr 6, 2024

SteveGilham commented Apr 7, 2024

This comment was marked as outdated.

bheemvennapureddy commented May 9, 2024

SteveGilham commented May 28, 2024

bheemvennapureddy commented Jul 25, 2024

SteveGilham commented Jul 26, 2024

SteveGilham commented Jul 27, 2024

SteveGilham commented Jul 28, 2024

bheemvennapureddy commented Jul 31, 2024 • edited Loading

SteveGilham commented Jul 31, 2024

SteveGilham commented Jul 31, 2024

SteveGilham commented Jul 31, 2024

bheemvennapureddy commented Jul 31, 2024

bheemvennapureddy commented Jul 31, 2024

SteveGilham commented Aug 1, 2024

bheemvennapureddy commented Aug 5, 2024

SteveGilham commented Aug 5, 2024

bheemvennapureddy commented Aug 6, 2024

SteveGilham commented Aug 11, 2024

SteveGilham commented Nov 14, 2024

tilak-nanavati-opshub commented Nov 15, 2023 •

edited

Loading

bheemvennapureddy commented Jul 31, 2024 •

edited

Loading