Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement coverage reports for Starlark #15594

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented May 30, 2022

If the new flag --starlark_coverage_report is set to a non-empty value, the Starlark coverage recorder will be enabled for all .bzl files under the output base executed during the Blaze command. At the end of the command, the collected coverage data will be emitted to the file path given as the flag argument in LCOV format, containing function, line, and branch coverage information.

@fmeum
Copy link
Collaborator Author

fmeum commented May 30, 2022

CC @Wyverald

I generated reports for some rulesets and found the results to be relatively accurate, but this would definitely benefit from more testing (it doesn't come with test cases yet) and polish. I'm also not sure whether it works for repository rules and/or functions executed during the execution phase, e.g. map_each callbacks. I wanted to open a PR to get early feedback and will happily fix bugs and supply tests if this is generally viewed favorably.

@fmeum fmeum marked this pull request as ready for review May 30, 2022 13:36
@fmeum fmeum requested review from brandjon and tetromino as code owners May 30, 2022 13:36
@@ -131,6 +133,7 @@ private static TokenKind execFor(StarlarkThread.Frame fr, ForStatement node)
continue;
case BREAK:
// Finish loop, execute next statement after loop.
CoverageRecorder.getInstance().recordVirtualJump(node);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any data to assess whether this causes a performance degradation even if coverage is not being recorded (which means that the instance returned will have a no-op recordVirtualJump). If this should turn out to be prohibitive, I could make the instance static final and populate it e.g. via an environment variable rather than a Bazel flag.

@Wyverald
Copy link
Member

Thanks for the PR, Fabian! I'll leave it to Jon & Sasha to review this. (NB: Jon is OOO until 9 June.)

Also cc @adonovan

@sgowroji sgowroji added team-Starlark-Integration Issues involving Bazel's integration with Starlark, excluding builtin symbols awaiting-review PR is awaiting review from an assigned reviewer labels May 31, 2022
@fmeum fmeum force-pushed the starlark-coverage branch 2 times, most recently from 24f1b2f to bce3f0d Compare May 31, 2022 07:09
@fmeum fmeum marked this pull request as draft May 31, 2022 08:03
@fmeum fmeum marked this pull request as ready for review May 31, 2022 08:34
@fmeum fmeum force-pushed the starlark-coverage branch from bce3f0d to 30765f5 Compare May 31, 2022 08:34
@fmeum
Copy link
Collaborator Author

fmeum commented May 31, 2022

I fixed some bugs and refactored the branch logic, but will not make further changes.

Copy link
Contributor

@sluongng sluongng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely am not familiar with Google Java Style guide or the readability process, but here are some nits I have after reading through the PR. Please take them with a grain of salt 🙇‍♂️

Thanks for working on this 🙏 I can see how this would be very useful for my debugging process.

src/main/java/net/starlark/java/syntax/Parameter.java Outdated Show resolved Hide resolved
src/main/java/net/starlark/java/eval/Eval.java Outdated Show resolved Hide resolved
src/main/java/net/starlark/java/eval/Eval.java Outdated Show resolved Hide resolved
src/main/java/net/starlark/java/eval/Eval.java Outdated Show resolved Hide resolved
defaultValue = "",
documentationCategory = OptionDocumentationCategory.LOGGING,
effectTags = {OptionEffectTag.BAZEL_MONITORING},
help = "Writes into the specified file an lcov coverage report for all Starlark files.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a discussion on Slack: It's not immediately clear to end-users that we are using this flag to report coverage of starlark during execution of a bazel command, similar to --starlark_cpu_profile.

My initial impression was that this, similar to other coverage flags, are used with bazel coverage when testing starlark rules with Skylib's unittest.bzl. This is NOT the case.

Might worth being a bit more specific in this description so that Bazel's end users are clear on intended usages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworded this a little to hopefully clarify the scope. I would be interested in integrating a less low-level version of this with bazel coverage, but don't really know how to accomplish this yet.

@fmeum fmeum force-pushed the starlark-coverage branch 4 times, most recently from 4e8e8ed to 0d2dcca Compare May 31, 2022 10:24
Copy link

@adonovan adonovan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending this, but a change this big needs a design discussion before code. What problem are we trying to solve? What design do you propose? How can we make it simpler?

My initial questions are:

  • Does this make the interpreter slower? A lot of work has gone into making it as fast as it is (although there is still room for improvement). Benchmarking is time consuming because the JVM is noisy---execution time may vary by 5% or more---making it hard to find regressions down at the 1-2% level.
  • Do we need all this new machinery? In principle, a coverage tool could be implemented as a debugger that simply steps through the whole program one instruction at a time recording the current pc. We already have a primitive debugger. Can its hooks be used to record coverage, with minimal changes?
  • Do we need branch coverage? I notice the tool records control jumps such as break statements and the short-circuit in if true || true {, and information about individual parameters. Is this necessary? Statement coverage is good enough for most purposes and is less invasive.

@@ -14,6 +14,7 @@
package net.starlark.java.syntax;

import java.util.List;
import net.starlark.java.syntax.Comprehension.Clause;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't import nested classes; always refer to Comprehension.Clause outside that class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -174,9 +174,7 @@ public void visit(@SuppressWarnings("unused") Comment node) {}
public void visit(ConditionalExpression node) {
visit(node.getCondition());
visit(node.getThenCase());
if (node.getElseCase() != null) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change safe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConditionalExpression#getElseCase() is not annotated as @Nullable, which is correct since the lexer only accepts a ConditionalStatement if it has an else.

import java.util.stream.Collectors;
import javax.annotation.Nullable;

abstract class CoverageVisitor extends NodeVisitor {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class doesn't belong in the syntax package: it's a client of that package, and it seems like part of the interpreter (eval). It appears there is only a single actual class that implements this abstraction, so I suggest you merge them together.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it doesn't belong here conceptually, but I didn't find a way to move it elsewhere:

  1. It is actually extended twice, once by CoverageNodeVisitor and once anonymously when registering a Program. These two usages are also why I extracted the class in the first place: It is important that the coverage registration and the coverage report generation pass stay in sync.
  2. All the new classes live in syntax since they have to be used in Program#compileFile. I searched for another place to hook into Program creation, but Program#compileFile is called directly from many places.

fmeum added 4 commits June 1, 2022 15:51
A `ConditionalExpression`, differing from an `IfStatement`, always has
an `else` case.
The generated coverage report includes function, line, and branch
coverage information in LCOV format.
If the new flag `--starlark_coverage_report` is set to a non-empty
value, the Starlark coverage recorder will be enabled for all `.bzl`
files under the output base executed during the Blaze command. At the
end of the command, the collected coverage data will be emitted to the
file path given as the flag argument in LCOV format.
@fmeum fmeum force-pushed the starlark-coverage branch from 0d2dcca to b603de0 Compare June 2, 2022 08:30
@fmeum
Copy link
Collaborator Author

fmeum commented Jun 2, 2022

Thanks for sending this, but a change this big needs a design discussion before code. What problem are we trying to solve? What design do you propose? How can we make it simpler?

Sorry, I certainly didn't intend to skip proper process here. I wrote this code out of a concrete need to better understand the edge cases of rulesets I maintain and to improve my understanding of the Java Starlark implementation. I just thought about sharing it hoping that it could be useful for others. I am completely open to rewriting this from scratch or scrapping it entirely, depending on what the outcome of the design discussion is.

My initial questions are:

  • Does this make the interpreter slower? A lot of work has gone into making it as fast as it is (although there is still room for improvement). Benchmarking is time consuming because the JVM is noisy---execution time may vary by 5% or more---making it hard to find regressions down at the 1-2% level.

The way the hooks are currently implemented, I won't be able to dismiss this concern with confidence. As you said, benchmarking doesn't necessarily catch all regressions on such a low level.

I do think that there is a way to implement the coverage hooks without any overhead in the case where coverage isn't being collected:

  1. Make the static CoverageRecorder instance final and populate it based on e.g. a system property starlark.lcov_report_file being set to a non-empty value.
  2. Make --starlark_coverage_profile a startup option that sets this system property for the Bazel server JVM.
  3. Add logic that dumps the report on server shutdown.

The static final field should be inlined by the JIT - if we want to be certain, we could also make it a simple boolean that triggers a coverage slow path if true. I could then double-check that the code emitted by the JIT doesn't change with the PR merged.

Do you like this approach better?

  • Do we need all this new machinery? In principle, a coverage tool could be implemented as a debugger that simply steps through the whole program one instruction at a time recording the current pc. We already have a primitive debugger. Can its hooks be used to record coverage, with minimal changes?

After a first read of the code, I'm inclined to say this would work for statement coverage, but not for more fine-grained notions of coverage.

  • Do we need branch coverage? I notice the tool records control jumps such as break statements and the short-circuit in if true || true {, and information about individual parameters. Is this necessary? Statement coverage is good enough for most purposes and is less invasive.

I agree that statement coverage would already be quite useful. However, given the general conciseness of Starlark (especially with list/dict comprehensions and conditional expressions), a single statement can encode rather complex logic. With the rulesets I have tried this on, statement coverage has almost always been nearing 100%, but branch coverage revealed some interesting uncovered cases (e.g., handling of workspace vs. external repository, checks for missing/superfluous files). While branch coverage information for breaks may not be as useful, I found it better to cover all branches rather than just some of them.

Assuming we find a way to ensure that coverage support introduces no overhead at all (as possibly described above), would you still prefer to remove some or all of the branch coverage hooks?.

@fmeum
Copy link
Collaborator Author

fmeum commented Jul 17, 2022

@adonovan Would you like me to prototype the static final approach and take a look at the JIT output? I think that this has a fair chance of alleviating the performance worries entirely.

@brandjon
Copy link
Member

Finally taking a look at this, as I promised Fabian back at BazelCon.

It looks like there's quite a bit more code to this PR than I originally thought. I'm interested in supporting more tooling for Starlark in the new year, coverage included, but in light of

  • the added code complexity (who will maintain all this extra logic?),
  • concerns of performance regression, and
  • uncertainty about the extent of the benefit of this feature to users, and the relative benefits of expression granularity vs statement granularity,

it's clear that a quick acceptance of this PR is off the table. I'd love to have a more involved discussion centered around the benefit vs maintainability tradeoff in Q1. I'll keep the PR open for now.

@brandjon brandjon added team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel and removed team-Starlark-Integration Issues involving Bazel's integration with Starlark, excluding builtin symbols awaiting-review PR is awaiting review from an assigned reviewer labels Dec 19, 2022
@fmeum
Copy link
Collaborator Author

fmeum commented Mar 5, 2024

@brandjon @tetromino Friendly ping on this.

I could trim this down to just line coverage and ensure that there is no runtime overhead if coverage is disabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants