Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDTEST-523] Expand test impact analysis with allocation tracing #197

Merged
merged 25 commits into from
Jul 15, 2024

Conversation

anmarchenko
Copy link
Member

@anmarchenko anmarchenko commented Jul 5, 2024

Problem statement
Using line coverage for test impact analysis has a major limitation in Ruby: consider the following examples:

# this class does not have any executable lines
class MyClass < OtherClass
end

test "instantiate MyClass" do
  assert MyClass.new != nil
end

The test "instantiate MyClass" does not cover MyClass because there are no executable lines in MyClass. If initializer was inherited from OtherClass, then this test will have other_class.rb in the list of covered files but not my_class.rb.

This leads to a major intelligent test runner bug: if we change initializer of MyClass like that:

class MyClass
  def initialize(arg)
    @arg = arg
  end
end

then the test above will start failing because MyClass.new expects argument now. But because my_class.rb is not covered by this test, intelligent test runner will skip test by default! It causes broken tests to be merged in the default branch.

If this example might seem artificial, unfortunately the same happens with ActiveRecord models or with ActiveModel classes:

# perfectly valid Rails model that is not covered by any test
class Account < ApplicationRecord
  belongs_to :user
end

Solution
We cannot overcome this limitation by using line coverage: the code coverage approach works correctly in this case and this is just how line coverage works. We need to go deeper in Ruby VM tracing using techniques that are already used by continuous profiler.

For this limitation, I've chosen to reach out for heap allocation tracepoint: it is possible to spy on every new object allocation that happens in Ruby heap. Even if no code from this class is executed during the test, it is enough for us to know that the test instantiates instances of this class to add its filename to the list of impacted files.

Notes on implementation:

  • RUBY_INTERNAL_EVENT_NEWOBJ event type is used
  • rb_tracepoint_new to register RUBY_INTERNAL_EVENT_NEWOBJ tracepoint
  • Module.const_source_location(klass_name) Ruby API is used to get the source of a constant (every class name is a constant). This API is available from Ruby 2.7 - this is exactly the oldest Ruby version supported for test visibility product. We use this API from C with rb_funcall(rb_cObject, rb_intern("const_source_location"), 1, klass_name)
  • rb_protect to ignore exceptions when getting source code location of a class (it fails for many internal classes)
  • Many libraries generate anonymous classes with class name like #<Class:0ff0eabcde> - we explicitly ignore them because getting source location for these classes always fails

Known limitations

  • This approach only tracks objects instantiated during the test itself (in before hook or during the test). If the test suite has some models cached in global state and shared between tests, and these models don't have any methods implemented on them, the test impact analysis will still miss them. We need to change docs on Intelligent test runner in Ruby reiterating that global state is harmful and can cause flakiness and incompatibility with ITR.
  • Ruby versions 3.2.0 - 3.2.2 have a bug that causes failures when using this tracepoint: allocation profiling is disabled for these versions
  • rb_tracepoint_new cannot be attached to a specific thread, so the allocation tracing is enabled for multi threaded coverage mode only (this mode is the default one and the only one that can work for Rails, so it is not a major problem)
  • Anonymous classes source locations are not supported

How to test the change?
Tested using by running test suites of the following open source projects:

  • feedbin
  • rubocop
  • jekyll
  • middleman
  • vagrant
  • devdocs

Unit tests reproducing the original problem are provided. See performance evaluation below.

Performance evaluation
Median performance overhead for some OSS projects' test suites before this change:
image

Results from benchmarks after this change:

Overall this change increases code coverage overhead on test suites by 30-40% (compared to overhead before the change) in relative numbers. In absolute numbers depending on project's size and characteristics it means from 7% up to 30% more time spent in tests (the maximum overhead is from rubocop which is particularly challenging for profiling: 21k relatively fast tests).

@codecov-commenter
Copy link

codecov-commenter commented Jul 8, 2024

Codecov Report

Attention: Patch coverage is 98.21429% with 2 lines in your changes missing coverage. Please review.

Project coverage is 98.86%. Comparing base (579867e) to head (aebb1bf).

Files Patch % Lines
lib/datadog/ci/configuration/components.rb 77.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #197      +/-   ##
==========================================
- Coverage   98.87%   98.86%   -0.01%     
==========================================
  Files         231      235       +4     
  Lines       10368    10477     +109     
  Branches      475      481       +6     
==========================================
+ Hits        10251    10358     +107     
- Misses        117      119       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@anmarchenko anmarchenko changed the title [SDTEST-523] Expand test impact analysis to the heap allocation [SDTEST-523] Expand test impact analysis with allocation profiler Jul 8, 2024
@anmarchenko anmarchenko force-pushed the anmarchenko/heap_allocation_tracepoint branch from 6e18534 to b78f095 Compare July 10, 2024 11:33
@anmarchenko anmarchenko changed the title [SDTEST-523] Expand test impact analysis with allocation profiler [SDTEST-523] Expand test impact analysis with allocation tracing Jul 10, 2024
@anmarchenko anmarchenko marked this pull request as ready for review July 11, 2024 08:50
@anmarchenko anmarchenko requested review from a team as code owners July 11, 2024 08:50
@anmarchenko anmarchenko requested a review from liashenko July 11, 2024 08:50
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few notes! I didn't do a pass very deep into the get_source_location part (ran out of time -- I can do it in the next pass :D) but hopefully this helps

lib/datadog/ci/configuration/components.rb Show resolved Hide resolved
ext/datadog_cov/datadog_cov.c Outdated Show resolved Hide resolved
ext/datadog_cov/datadog_cov.c Outdated Show resolved Hide resolved
ext/datadog_cov/datadog_cov.c Outdated Show resolved Hide resolved
ext/datadog_cov/datadog_cov.c Outdated Show resolved Hide resolved
@devinburnette
Copy link

just tested this branch as of the latest commit (0b1618c) and verified the segfault is gone and the test impact analysis is working as expected for models.

@anmarchenko
Copy link
Member Author

Thank you for testing and your feedback @devinburnette! I'll do a couple more passes on Monday and will release soon

@anmarchenko
Copy link
Member Author

@ivoanjo this is ready for another pass - seems to be working now

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left another round of suggestions, but I think overall my comments fall in the "extra things" bucket, and this PR seems reasonable even as-is.

spec/ddcov/ddcov_spec.rb Outdated Show resolved Hide resolved
ext/datadog_cov/datadog_cov.c Outdated Show resolved Hide resolved
ext/datadog_cov/datadog_cov.c Show resolved Hide resolved
ext/datadog_cov/datadog_cov.c Outdated Show resolved Hide resolved
ext/datadog_cov/datadog_cov.c Outdated Show resolved Hide resolved
Comment on lines 269 to 282
enum ruby_value_type type = rb_type(new_object);
if (type != RUBY_T_OBJECT && type != RUBY_T_STRUCT)
{
return;
}

// if ignored_path is provided and the current filename is located under the ignored_path, we skip it too
// this is useful for ignoring bundled gems location
if (dd_cov_data->ignored_path_len != 0 && strncmp(dd_cov_data->ignored_path, filename_ptr, dd_cov_data->ignored_path_len) == 0)
VALUE klass = rb_class_of(new_object);
if (klass == Qnil || klass == 0)
{
return;
}
// Skip anonymous classes starting with "#<Class".
// it allows us to skip the source location lookup that will always fail
const char *name = rb_obj_classname(new_object);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-up, on the discussion of what we can do in the new object tracepoint, in some cases rb_obj_classname will definitely cause new objects to be allocated.

Thus doing this check here may be not be safe (although arguably we've been doing it in the profiler and I've never seen issues...) and you may want to delay it perhaps.

The good news is maybe there's a few, even better options. I was looking what rb_obj_classname does, and maybe there's an alternative that is even better for our purposes because it doesn't even need to allocate the string to represent the anonymous class -- I'm thinking of rb_mod_name or some of the other ones that exist.

Copy link
Member Author

@anmarchenko anmarchenko Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your rb_mod_name suggestion - it returns Qnil if klass is anonymous, so it works for filtering out anonymous classes, and works a bit faster

ext/datadog_cov/datadog_cov.c Outdated Show resolved Hide resolved
ext/datadog_cov/datadog_cov.c Outdated Show resolved Hide resolved
Comment on lines +216 to +220
// Get source location for a given class name
static VALUE get_source_location(VALUE klass_name)
{
return rb_funcall(rb_cObject, rb_intern("const_source_location"), 1, klass_name);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So on the topic of performance, it occurs to me that if you're starting and stopping coverage on every individual test case, you may be redoing these kinds of lookups again and again.

Maybe not for this PR, but perhaps it's worth considering having some kind of a cache that would live across coverage start/stops? (Of course with proper sizing, etc)

Copy link
Member Author

@anmarchenko anmarchenko Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, interesting thing that I tried two kind of caches:

  • hashtable (st.c) that tracks source file for every class
  • hashtable that for every class stores boolean value indicating whether the source file is from inside project or from external gem

Every combination of these 2 approaches ether do not change overhead for rubocop test source or makes it on about 4% slower (on average)! I did not investigate it further so I don't know yet why intuition does not work in this case: maybe if I have time I'll do another pass.

For now, I am leaving it as is without additional caches as I believe that every optimisation must be backed by solid data, otherwise it will be just another source of bugs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable 👍

ext/datadog_cov/datadog_cov.c Outdated Show resolved Hide resolved
@anmarchenko anmarchenko merged commit 186ccf6 into main Jul 15, 2024
28 checks passed
@anmarchenko anmarchenko deleted the anmarchenko/heap_allocation_tracepoint branch July 15, 2024 17:15
@github-actions github-actions bot added this to the 1.2.0 milestone Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants