Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: easier manual collation of result sets #681

Closed
wants to merge 9 commits into from
Closed

WIP: easier manual collation of result sets #681

wants to merge 9 commits into from

Conversation

ticky
Copy link
Contributor

@ticky ticky commented May 25, 2018

This adds a new entry point for collating multiple .resultset.json files, when generated by parallelised, multi-machine test runs (for example when used with Knapsack Pro on Circle, Travis or Buildkite).

In these environments, the current behaviour of merging with the “last” result set is defeated by the fact that runs occur on isolated containers or machines which don’t know about the “last” run.

For these cases, you can create, for instance, a Rakefile or script file, which calls SimpleCov.collate with a list of paths to .resultset.json files from each parallel test run. In our case, we tar these and upload them as artifacts.

# coverage_report.rake
namespace :coverage do
  task :report do
    require 'simplecov'

    SimpleCov.collate Dir["simplecov-resultset-*/.resultset.json"], 'rails' do
      formatter SimpleCov::Formatter::MultiFormatter.new([
        SimpleCov::Formatter::SimpleFormatter,
        SimpleCov::Formatter::HTMLFormatter
      ])
    end
  end
end

Calling SimpleCov.collate takes the same type of configuration block as SimpleCov.start but will not collect coverage stats itself, only combine the results from the input JSON files, and configure SimpleCov to format the results when the task exits in the same way as if you called SimpleCov.start.

To Do:

  • Initial pass at feature
  • Test in-place in a project
  • Add specs
  • Add documentation

@ticky
Copy link
Contributor Author

ticky commented May 25, 2018

I have tried to set up a test but I can’t wrap my head around Cucumber. The test basically just needs to set up two faked resultset json files and pass their paths into SimpleCov.collate. I think this would be easier to do in RSpec, but given the lifecycle hooks SimpleCov relies on to generate its output, I’m not sure how to properly test that, and it looks like all the tests that rely on that live in Cucumber land.

@PragTob
Copy link
Collaborator

PragTob commented May 26, 2018

👋

First thanks a lot for the contribution! That's great and often looked for, thanks 🎉
Might take me some time to get to a proper review - sorry about that :(

Regarding testing, unit tested in rspec but integration/acceptance test in cucumber would probably be the best way to go.

I'll also do @bf4 as I remember he talked about PRing something like this/having something like this handy which I don't so he's primed for a review :)

@bf4
Copy link
Collaborator

bf4 commented May 26, 2018 via email

@ticky
Copy link
Contributor Author

ticky commented May 28, 2018

Thanks for your positivity! 😄

Regarding testing, unit tested in rspec but integration/acceptance test in cucumber would probably be the best way to go.

Perhaps, but I don’t think I understand enough about Cucumber to deliver a reasonable test in it myself. I think I’ll need some help with delivering that side of things!

Basically a switch to determine if a) it should generate a result, ie measure coverage but no report or failure and/or b) it should generate a report and exit code

For what it’s worth, this PR is perhaps lightly abusing the fact that SimpleCov’s state management will generate a result on exit if the pid and running are set. I did not want to dig into further changes as much of it would’ve had to change assumptions about how SimpleCov worked in general, or duplicating the code paths SimpleCov takes exclusively on exit.

If anything I’d prefer for this PR to be considered primarily for how it’s used by the end-user rather than by that portion of the approach, as I consider it more reasonable that someone more familiar with the intimate workings of the code could deliver a cleaner integration than I, an outsider!

I briefly considered giving SimpleCov a binary entry point, but quickly realised that because the SimpleCov instance which generates the report files is the one which needs the profile, group and exclusion configuration applied, that that wasn’t any better than exposing it in a similar vein to SimpleCov.start itself.

@bf4
Copy link
Collaborator

bf4 commented May 29, 2018

I'm looking at it and gonna want to think on what problem collate is meant to solve and also how it's doing it.

Did you take a look at my impl I pasted in #653 (and some possible issues around merging files from another machine) and my stab at making a Report class? I'd be interested in your thoughts.

I think my report task after #653 would look like the below (haven't redone it 😊 )

 #!/usr/bin/env ruby

require 'json'
class CoverageReport
  attr_reader :formatters

  def initialize
    @formatters = []
  end

  def configure_to_generate_report!
    @minimum_coverage = ENV.fetch('COVERAGE_MINIMUM') { 100.0 }.to_f.round(2)
    SimpleCov.configure do
      minimum_coverage @minimum_coverage
      # minimum_coverage_by_file 60
      # maximum_coverage_drop 1
      refuse_coverage_drop
    end

    @formatters = [SimpleCov::Formatter::HTMLFormatter]
  end

  def generate_report!
    report_dir = SimpleCov.coverage_dir
    file = File.join(report_dir, '.resultset.json')
    if File.exist?(file)
      json = JSON.parse(File.read(file))
      result = SimpleCov::Result.from_hash(json)
      results = [result]
      merged_result = SimpleCov::ResultMerger.merge_results(*results)
      merged_result.format!
      STDERR.puts "[COVERAGE] merged #{file}; processing..."
      process_result(merged_result)
    else
      abort "No files found to report: #{Dir.glob(report_dir)}"
    end
  end

  def process_result(result)
    @exit_status = SimpleCov.process_result(result, SimpleCov::ExitCodes::SUCCESS)

    # Force exit with stored status (see github issue #5)
    # unless it's nil or 0 (see github issue #281)
    Kernel.exit @exit_status if @exit_status && @exit_status > 0
  end
end

if __FILE__ == $0
  require 'simplecov'
  reporter = CoverageReport.new
  reporter.configure_to_generate_report!
  reporter.generate_report!
end

I think the generate_report! is similar to collate

@ticky
Copy link
Contributor Author

ticky commented May 29, 2018

Did you take a look at my impl I pasted in #653

I had no idea that existed ;)

possible issues around merging files from another machine

I did notice that - in our case our tests and collate call are both running inside Docker images where the paths are always the same, and valid, in both steps, but the call which checks a file exists is almost certainly not useful to us when collating!

my stab at making a Report class

I’d definitely welcome a refactor which makes SimpleCov less reliant on its own internal state (right now you can call format! on a result, which runs formatting using configuration which exists within SimpleCov, which seems sort of backwards to me) but didn’t want to go changing too much internal stuff here!

Your CoverageReport class looks okay, though it doesn’t collate several result sets, only a single result set. It also doesn’t allow for customising the output configuration, given the SimpleCov.configure call is internalised. For this reasons I don’t think it really solves the same problem!

@bf4
Copy link
Collaborator

bf4 commented May 29, 2018

@ticky

Your CoverageReport class looks okay, though it doesn’t collate several result sets, only a single result set. It also doesn’t allow for customising the output configuration, given the SimpleCov.configure call is internalised. For this reasons I don’t think it really solves the same problem!
legit. It only solved my immediate problem, but was written with intention to be more generalized.

I linked to existing 'solutions' as well

I think a cli to 'collate' multiple files could be a good idea, but right now that's more work. Let's get the simple case working first.

I think an important consideration is how SimpleCov could be used

  • Started when tests run, generates results json, merges results json, generates report, and applies SimpleCov failure conditions
  • Started when tests run, generates results json
  • Collects results json, merges results json, generates report, and applies SimpleCov failure conditions
  • Uses merged results json, applies SimpleCov failure conditions

@ticky
Copy link
Contributor Author

ticky commented May 30, 2018

I linked to existing 'solutions' as well

Sorry, I don’t think I see where that was?

I think a cli to 'collate' multiple files could be a good idea, but right now that's more work. Let's get the simple case working first.

That’s what I aimed for in this PR; the simple case is working!

Regarding your use cases, I’ll number them for convenience here:

  1. Started when tests run, generates results json, merges results json, generates report, and applies SimpleCov failure conditions
  2. Started when tests run, generates results json
  3. Collects results json, merges results json, generates report, and applies SimpleCov failure conditions
  4. Uses merged results json, applies SimpleCov failure conditions

This PR currently expects you to run SimpleCov in mode 1, followed by a separate step in mode 3. Mode 2 is not possible currently (there is always a formatter). Mode 4 is also not technically possible for the same reason as mode 2, except in that you could either use the side effect that start will merge with existing results, or that you could technically call collate to “merge” and test for failure based on single result set.

end

# Use the ResultMerger to produce a single, merged result, ready to use.
@result = SimpleCov::ResultMerger.merge_results(*results)
Copy link
Collaborator

@bf4 bf4 May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

me typing while looking at your code and what I've written.. untested..

  #   SimpleCov.start 'app'
  #   STDERR.puts '[COVERAGE] Running'
  #   configure_to_only_generate_result!
  def configure_to_only_generate_result!
   SimpleCov.configure do
      # use_merging   true
      minimum_coverage 0.0 # disable
      maximum_coverage_drop 100.0 # disable
      Instance_eval(Proc.new) if block_given?
    end
    SimpleCov.formatters = []
    SimpleCov.at_exit do
      STDERR.puts "[COVERAGE] creating #{File.join(SimpleCov.coverage_dir, '.resultset.json')}"
      SimpleCov.result.format!
    end
  end

  # generate_report!(result_filename) do
  #    minimum_coverage ENV.fetch('COVERAGE_MINIMUM') { 100.0 }.to_f.round(2)
  #    minimum_coverage_by_file 60
  #    maximum_coverage_drop 1
  #    refuse_coverage_drop
  #    SimpleCov.formatters = [SimpleCov::Formatter::HTMLFormatter]
  # end
  def generate_report!(result_filenames)
      Proc.new.call if block_given?
      merged_result = merge_results_files(result_filenames)
      if SimpleCov.formatters.any?
        merged_result.format!
      end
      process_result(merged_result)
  end

  def merge_results_files(results_files)
    results = results_files.map {|results_file|
      next unless File.exist?(results_file)
      json = JSON.parse(File.read(results_file))
      STDERR.puts "[COVERAGE] merging #{results_file}; processing..."
      SimpleCov::Result.from_hash(json)
    }.compact
    if results.any?
      SimpleCov::ResultMerger.merge_results(*results)
    else
      abort "No files found to report: #{results_files.inspect}"
    end
  end

  def process_result(result)
    @exit_status = SimpleCov.process_result(result, SimpleCov::ExitCodes::SUCCESS)

    # Force exit with stored status (see github issue #5)
    # unless it's nil or 0 (see github issue #281)
    Kernel.exit @exit_status if @exit_status && @exit_status > 0
  end

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bf4 sorry I'm not sure what this is supposed to tell me/us? 🤔 Is there something missing from this PR?

@deivid-rodriguez
Copy link
Collaborator

@ticky I love this PR. Are you planning to keep working on this?

@ticky
Copy link
Contributor Author

ticky commented Jun 28, 2018

@deivid-rodriguez I’m unclear on whether a change like this is something the maintainers are interested in, so I’ve not really looked at it since

@deivid-rodriguez
Copy link
Collaborator

I see, hopefully they see our messages and can clarify 😃

@deivid-rodriguez
Copy link
Collaborator

@PragTob So is there any interest in merging this once some specs are added?

@deivid-rodriguez
Copy link
Collaborator

@PragTob ping!

@JasonLunn
Copy link

Bump

@bf4 bf4 changed the title WIP: Allow manual collation of result sets WIP: easier manual collation of result sets Mar 15, 2019
@bf4
Copy link
Collaborator

bf4 commented Mar 15, 2019

looking over it again, I think the problem this PR is addressing is a flow like

  1. SimpleCov was started when tests run, generates results json in various places. (no formatters used, no failure conditions set)
  2. Given the collection of results json, merges results json into one merged 'result'
  3. Use merged result to generates report
  4. Apply applies SimpleCov failure conditions

the code in the PR I think runs steps 2-4 together.

I'm not sure why the back and forth petered out.

@ticky
Copy link
Contributor Author

ticky commented Mar 15, 2019

looking over it again, I think the problem this PR is addressing is a flow like

  1. SimpleCov was started when tests run, generates results json in various places. (no formatters used, no failure conditions set)
  2. Given the collection of results json, merges results json into one merged 'result'
  3. Use merged result to generates report
  4. Apply applies SimpleCov failure conditions

the code in the PR I think runs steps 2-4 together.

That’s roughly correct, though “no formatters used” is as I understand it impossible, so SimpleFormatter is being used for those jobs.

I'm not sure why the back and forth petered out.

I didn’t get a clear response as to whether this change would be considered for merging, so aside from implementing the changes based on direct feedback, this is as far as I got with it.


For what it’s worth, this branch’s implementation remains what we’re using to do coverage on our parallel (37 individual rspec runner instances at last count) test runs at Buildkite to this day. Simplecov is configured in spec_helper.rb, used in each of those instances, like this:

if ENV['CI'] || ENV['COVERAGE']
  require 'simplecov'
  require 'simplecov-console'

  SimpleCov.start 'rails' do
    # Disambiguates individual Buildkite Jobs' test runs
    command_name "Job #{ENV["BUILDKITE_JOB_ID"]}" if ENV["BUILDKITE_JOB_ID"]

    if ENV['CI']
      # Makes the output from this test run just a bit of text
      formatter SimpleCov::Formatter::SimpleFormatter
    else
      formatter SimpleCov::Formatter::MultiFormatter.new([
        SimpleCov::Formatter::Console,
        SimpleCov::Formatter::HTMLFormatter
      ])
    end
  end
end

Each runner then uploads the .resultset.json, after tarring it:

tar -czvf "tmp/simplecov-resultset-$BUILDKITE_JOB_ID.tar.gz" -C coverage .resultset.json

Once all the parallel jobs are complete, a coverage collation job runs, which fetches and extracts all the tarballed result sets. It then runs the coverage:report rake task, which is defined like this:

# frozen_string_literal: true

namespace :coverage do
  task :report do
    require "simplecov"
    require "simplecov-console"

    SimpleCov.collate Dir["tmp/coverage/simplecov-resultset-*/.resultset.json"], "rails" do
      formatter SimpleCov::Formatter::MultiFormatter.new([
        SimpleCov::Formatter::Console,
        SimpleCov::Formatter::HTMLFormatter,
      ])
    end
  end
end

@JasonLunn
Copy link

@ticky - it's great to know that this PR has been getting regular use use elsewhere for months. Are you looking for contributions on outstanding TODO to write the specs?

@ticky
Copy link
Contributor Author

ticky commented Mar 15, 2019

Are you looking for contributions on outstanding TODO to write the specs?

Absolutely! I haven’t looked at speccing it since my comment in May 2018 but I wasn’t quite able to understand how to write a Cucumber spec suite for this.

@ticky
Copy link
Contributor Author

ticky commented Mar 15, 2019

Now I look at my old working tree, I actually have most of what should be a working Cucumber implementation. I’ll push what I had up here so someone can help with it!

@JasonLunn
Copy link

Your test works for me locally - I'm not sure what is different that it sometimes fails in Travis...

@ticky
Copy link
Contributor Author

ticky commented Mar 15, 2019

I think due to the way it’s shuffling files around that it is sensitive to the test suite having been run before it. I couldn’t figure out how to distill down the steps the normal tests take, nor could I figure out how to get error output out of the executed command within Cucumber.

@JasonLunn
Copy link

I've got it repeatedly failing locally now.
I think the issue is that the part1 task is setting up the libs differently than the test task does, and thus it later fails trying to load test_helper

@JasonLunn
Copy link

JasonLunn commented Mar 15, 2019

To reproduce:

  1. I added a tag to test_unit_collate.feature and then ran bundle exec cucumber -t @my_tag
  2. I changed When I successfully run `bundle exec rake part1` to When I run `bundle exec rake part1` interactively

Then I was able to see the text of the error:

File does not exist: test_helper

rake aborted!

@PragTob
Copy link
Collaborator

PragTob commented Jun 25, 2019

Hi @ticky @deivid-rodriguez @JasonLunn @bf4 and everyone else :)

This is definitely something we're interested in having. I took a lengthy break from maintaining simplecov due to life and such. Sorry for that and the huge delay. I'm trying to get back into it and cleaning up the backlog.

I also left this somewhat for @bf4 as he knows this stuff more than I do :) Getting the cucumber test to run would be magnificent as it'd give me & everyone else more confidence that we won't accidentally break this going forward. I know you already poured tons of work into this so I understand if you won't get to it.

Anyhow, thanks a lot I'll try to take my too warm brain to a review now :)

IMG_20180622_182841-ANIMATION

Copy link
Collaborator

@PragTob PragTob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great from my view, thanks for taking all this time and care to do this 💚

Admittedly I don't have any experience with merging results from different machines' simplecov runs, so I might be wrong 😅

Getting the cuke and maybe some specs up would be great though to assure we won't accidentally break this in the future.

Thanks heaps! 🎉

Aside: Which Ruby version are you running? I'm asking because there's #700 and if you wanted a release where this still works with old ruby versions I'd try to merge this before removing support and doing a release with this feature before it.

IMG_20190127_140756

#### Timeout for merge

Of course, your cached coverage data is likely to become invalid at some point. Thus, when automatically merging
sequential test runs, result sets that are older than `SimpleCov.merge_timeout` will not be used any more. By default,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above we call it subsequent, here sequential. I know it's the same but I think it'd help understanding if we called it the same :)


You can deactivate this automatic merging altogether with `SimpleCov.use_merging false`.

### Between parallel test runs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reminds me more of parallel_tests aka to run it in parallel locally. It'd be good to make it clear in the headline already that it's for collation of results from multiple machines/VMs whatever


`SimpleCov.collate` also takes a block for configuration, just the same as `SimpleCov.start` or `SimpleCov.configure`.
This means you can configure a separate formatter for the collated output. For instance, you can make the formatter in `SimpleCov.start`
the `SimpleCov::Formatter::SimpleFormatter`, and only use more complex formatters in the finall `SimpleCov.collate` run.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final :)


SimpleCov.start 'rails' do
# Disambiguates individual test runs
command_name "Job #{ENV["TEST_ENV_NUMBER"]}" if ENV["TEST_ENV_NUMBER"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be nice to explicitly state in a comment in this file or in the description above that this is the config of an individual runner (it was at least something I at first wondered about when reading this)

```

```ruby
# lib/tasks/coverage_report.rake
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then state once again specifically that this would be the task to collate them al 🎉

end
end
end
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you btw. for providing such extensive and well written documentation right from the start 💚 Really makes it not only easier for all the people using this but also for me to review 😁

I know I commented a lot above but it's all very minor things, this is amazing.

breathtaking

| lib/faked_project/framework_specific.rb | 75.0 % |
| lib/faked_project/meta_magic.rb | 100.0 % |
| test/meta_magic_test.rb | 100.0 % |
| test/some_class_test.rb | 100.0 % |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as said before, fixing the cuke would be super great and even maybe some unit specs.

end

# Use the ResultMerger to produce a single, merged result, ready to use.
@result = SimpleCov::ResultMerger.merge_results(*results)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bf4 sorry I'm not sure what this is supposed to tell me/us? 🤔 Is there something missing from this PR?

@JasonLunn
Copy link

Glad to see this getting some attention again. One observation from having used this in our codebase for many months is that we ran into trouble with the results being merged properly across machines. Specifically, we couldn't guarantee that the absolute file paths to the results were identical because the checkout paths were dynamically managed by our CI system. We patched around it like so in .simplecov:

    # In order to aggregate results across CI agents make all file paths relative when written to disk
    module ResultMergerExtension
        # Override the default implementation to make all paths project relative for portability
        def store_result result
            path = SimpleCov.root.to_s + "/"
            command_name, data = result.to_hash.first
            original_coverage = data[ 'coverage' ]
            new_coverage = {}
            original_coverage.each_key do | key |
                new_coverage[ key.gsub( path, '' ) ] = original_coverage[ key ]
            end
            data[ 'coverage' ] = new_coverage
            super command_name => data
        end
    end

    module SimpleCov
        module ResultMerger
            class << self
                prepend ResultMergerExtension
            end
        end
    end

This approach introduced other issues because not everything was really happy with relative file paths, which lead us to do things like this:

    SimpleCov.start 'rails' do
        # The BlockFilter in root_filter doesn't like relative paths, so we drop it. Manually filter gems, etc.
        filters.reject! { | filter | filter.is_a? SimpleCov::BlockFilter }
        add_filter [ '/gems/', '/stdlib/' ]
   end

@PragTob
Copy link
Collaborator

PragTob commented Jun 26, 2019

@JasonLunn thanks for the input! Just to make sure I understand, with absolute paths you mean the paths were the covered files are stored on disk so like /home/jenkins/workspace-2/my_app/app/models/person.rb correct? That makes sense and seems like we maybe should do something about that/making things work with relative paths or some other main line workaround. I'd definitely keep it out of this PR and handle it as a separate feature PR. Also thanks for sharing your experience running this it's very valuable 👌

@JasonLunn
Copy link

@PragTob - All I would do with this PR would be to mention in the README that for now there is a limitation and results computed from different checkout paths will not merge correctly even if they have the same paths relative to SimpleCov.root.

@sj26
Copy link

sj26 commented Aug 25, 2019

Can I help get this over the line? Sounds like it just needs some test love?

@ticky
Copy link
Contributor Author

ticky commented Aug 26, 2019

@sj26 yeah, I don't understand enough about Cucumber syntax to get that side of things over the line, so I'd love a hand with it!

@PragTob
Copy link
Collaborator

PragTob commented Dec 3, 2019

@ticky Hi everyone! Time competing with OSS as usual... thanks to ruby together I have quite some time to dedicate to simplecov during the next 3 months. Helping get this merged with cucumber etc. is naturally on the list (although not to the very top).

I've also never handled this concrete problem myself, so you can call me a bit of a noob but I realize it's something lots of people in the ruby eco system need and we should be able to help them there :)

@deivid-rodriguez
Copy link
Collaborator

@PragTob I was also planning to have a look at this, actually, since I would use this feature myself, and this PR looks like really close to being ready!

@PragTob
Copy link
Collaborator

PragTob commented Dec 4, 2019

@deivid-rodriguez knock yourself out :) My focus right now is more over on the branch coverage side of things with that and a minor knee surgery coming up next week I don't believe I'll have time to work on this for at least the next 2 weeks. Plus, having someone actually using this work on it is surely better than me doing my best guess work 😅

@PragTob
Copy link
Collaborator

PragTob commented Dec 9, 2019

This work is carried on in #780 - so closing in favor of that one!

@PragTob PragTob closed this Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants