cargo QoL improvements #4228

roypat · 2023-11-09T10:16:15Z

Changes

Turns example binaries in log-instrument into actual examples, and allows us to use cargo run to execute our helper tools

Reason

I couldn't do cargo run --bin cpu-template-helper before + compile time improvements.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

If a specific issue led to this PR, this PR closes the issue.
The description of changes is clear and encompassing.
Any required documentation changes (code and docs) are included in this PR.
API changes follow the Runbook for Firecracker API changes.
User-facing changes are mentioned in CHANGELOG.md.
All added/changed functionality is tested.
New TODOs link to an issue.
Commits meet contribution quality standards.

This functionality cannot be added in rust-vmm.

Previously, only the `firecracker` binary could be run via cargo run. Now, we can also do `cargo run --bin cpu-template-helper` for example. I am keeping the jailer excluded from the `default-members` list, as this seems to have been an intentional choice made in 3bf285c. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>

Previously, it could only compile examples from the main firecracker package. To compile examples from package "x", we need to pass `-p x`. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>

This prevents them from being compiled if only `cargo build` is being run, where they really do not need to be compiled and only increase build time needlessly. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>

codecov · 2023-11-09T10:23:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (44dfc0c) 82.21% compared to head (bb76879) 81.75%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4228      +/-   ##
==========================================
- Coverage   82.21%   81.75%   -0.47%     
==========================================
  Files         236      236              
  Lines       29193    29187       -6     
==========================================
- Hits        24001    23861     -140     
- Misses       5192     5326     +134

Flag	Coverage Δ
4.14-c7g.metal	`77.19% <ø> (-0.55%)`	⬇️
4.14-m5d.metal	`79.05% <ø> (-0.52%)`	⬇️
4.14-m6a.metal	`78.18% <ø> (-0.53%)`	⬇️
4.14-m6g.metal	`77.19% <ø> (-0.55%)`	⬇️
4.14-m6i.metal	`79.05% <ø> (-0.51%)`	⬇️
5.10-c7g.metal	`80.08% <ø> (-0.55%)`	⬇️
5.10-m5d.metal	`81.71% <ø> (-0.51%)`	⬇️
5.10-m6a.metal	`80.93% <ø> (-0.53%)`	⬇️
5.10-m6g.metal	`80.08% <ø> (-0.55%)`	⬇️
5.10-m6i.metal	`81.71% <ø> (-0.51%)`	⬇️
6.1-c7g.metal	`80.08% <ø> (-0.55%)`	⬇️
6.1-m5d.metal	`81.72% <ø> (-0.50%)`	⬇️
6.1-m6a.metal	`80.93% <ø> (-0.53%)`	⬇️
6.1-m6g.metal	`80.08% <ø> (-0.55%)`	⬇️
6.1-m6i.metal	`81.71% <ø> (-0.51%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
src/log-instrument/examples/five.rs	`0.00% <ø> (ø)`
src/log-instrument/examples/four.rs	`0.00% <ø> (ø)`
src/log-instrument/examples/one.rs	`0.00% <ø> (ø)`
src/log-instrument/examples/six.rs	`0.00% <ø> (ø)`
src/log-instrument/examples/three.rs	`0.00% <ø> (ø)`
src/log-instrument/examples/two.rs	`0.00% <ø> (ø)`

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

JonathanWoollett-Light · 2023-11-09T10:45:59Z

This has -96.88% coverage on src/log-instrument/src/lib.rs. Should we care about coverage here?

Cargo.toml

roypat · 2023-11-09T10:51:08Z

This has -96.88% coverage on src/log-instrument/src/lib.rs. Should we care about coverage here?

I dont think this matters much, the tests are still there after all

JonathanWoollett-Light · 2023-11-09T10:54:54Z

This has -96.88% coverage on src/log-instrument/src/lib.rs. Should we care about coverage here?

I dont think this matters much, the tests are still there after all

It seems this is always how we approach this. If we functionally ignore it by never taking action as a result of the metric. Why do we track it? Maybe we should remove it if it is just noise?

This PR can act as a good example, we should clarify the story here.

roypat · 2023-11-09T11:08:21Z

This has -96.88% coverage on src/log-instrument/src/lib.rs. Should we care about coverage here?

I dont think this matters much, the tests are still there after all

It seems this is always how we approach this. If we functionally ignore it by never taking action as a result of the metric. Why do we track it? Maybe we should remove it if it is just noise?

This PR can act as a good example, we should clarify the story here.

To me, we track code coverage to identify gaps in our testing. E.g. a low coverage percentage can be a hint that some PR is missing some unit tests, or the annotations that codecov adds to the PR diff shows some uncovered lines which then leads us to add more testing (this is how I discovered the initrd issues in #4073).

I agree that looking at the metric in isolation is meaningless. To me its a case-by-case decision on every PR. In this specific case, the coverage drops, but there is no gap in testing, so I ignore it.

wearyzen

I think the files were designed to be tests so changing them to examples doesn't add much value to it.
If improving the build time is the only reason why the tests were turned to examples then, could we add the details on by how much did the build improve? I don't mind an additional build time of few minutes if codecov can show me a 100%coverage.

roypat · 2023-11-09T16:55:35Z

I think the files were designed to be tests so changing them to examples doesn't add much value to it. If improving the build time is the only reason why the tests were turned to examples then, could we add the details on by how much did the build improve? I don't mind an additional build time of few minutes if codecov can show me a 100%coverage.

The compile time impact to tools/devtool build --release is 1m 45s for main vs 1m 27s on this PR branch (so we see an improvement of ~14%).

But independently of that, I feel fairly strongly about not exposing testing details as binary targets. Currently for example ./tools/devtool build --release will point the user to the release directory, and if they look there, they see six very confusing binaries just named "one" through "six" (in addition to the expected firecracker binaries). This is just a single bug away from us accidentally uploading them as official release artifacts, too. They probably also show up in various other cargo outputs (for example, if we go with Jonathan's suggestions to remove default-members altogether from Cargo.toml, they will show up in the output of cargo run --bin when it lists possible binaries to run). This really shouldnt be the case.

wearyzen · 2023-11-09T18:25:35Z

I think the files were designed to be tests so changing them to examples doesn't add much value to it. If improving the build time is the only reason why the tests were turned to examples then, could we add the details on by how much did the build improve? I don't mind an additional build time of few minutes if codecov can show me a 100%coverage.

The compile time impact to tools/devtool build --release is 1m 45s for main vs 1m 27s on this PR branch (so we see an improvement of ~14%).

But independently of that, I feel fairly strongly about not exposing testing details as binary targets. Currently for example ./tools/devtool build --release will point the user to the release directory, and if they look there, they see six very confusing binaries just named "one" through "six" (in addition to the expected firecracker binaries). This is just a single bug away from us accidentally uploading them as official release artifacts, too. They probably also show up in various other cargo outputs (for example, if we go with Jonathan's suggestions to remove default-members altogether from Cargo.toml, they will show up in the output of cargo run --bin when it lists possible binaries to run). This really shouldnt be the case.

The build improvement is ~20secs which is too small to justify removing the tests however, I understand why having those weird binaries could cause a confusion and this concern was also raised when they were first added.
If we want to remove the test bins I would suggest moving them to better place instead of completely removing them.
There is an alternative solution proposed here.

I agree that the bins could cause confusion where they are right now but, replacing them with a python integration test goes against our code coverage policy. I suggest we create an issue to do the right thing (moving to correct place while maintaining coverage) but until then I don't think we should remove them.

roypat · 2023-11-10T09:49:17Z

The build improvement is ~20secs which is too small to justify removing the tests however, I understand why having those weird binaries could cause a confusion and this concern was also raised when they were first added. If we want to remove the test bins I would suggest moving them to better place instead of completely removing them. There is an alternative solution proposed here.

I agree that the bins could cause confusion where they are right now but, replacing them with a python integration test goes against our code coverage policy. I suggest we create an issue to do the right thing (moving to correct place while maintaining coverage) but until then I don't think we should remove them.

I want to stress that no tests are being removed in this PR. We are still exercising exactly the same code paths and checking exactly the same assertions. Here, we have a case of code being very hard to unit tests (we could mock the logger, which would also solve this problem, but even then we cannot properly check any assertions because the log messages contain thread ids, and if the logs arent generated by a separate process, these become non-deterministic). Whenever code is very hard to unit tests, and workarounds introduce drawbacks (such as accidentally vending dummy binaries to users, or adding significant complexity to the testing code, as Jonathan mentions his alternative would do), we should use integration testing for it. It is also a matter of consistency with the rest of our testing framework: UFFD and seccomp tests are cargo example that are driven by python integration tests (and when we did this, even promoting them to examples from crates hidden inside of our testing framework went through a lot of discussion because of the same point: It puts testing code into a more prominent place alongside production code). And we have also in the past decided that we value sensibility of our testing over the code coverage number, when we decided to remove "coverage generating" tests for debug implementations, because they added no functional value. Because the code coverage number is a proxy for the property that we actually care about: Code being tested. And if we only focus on the proxy and try to game the number, we lose track of that goal, and that is when it becomes useless.

[...] goes against our code coverage policy.

I am not aware of us having any policy regarding code coverage (I know we decided against introducing static checks in our CI for it, because it was impossible to meaningfully phrase failure conditions that do not rule our valid usecases). We only have a team-set goal to reach 85% coverage, but even here I could argue that the 81.77% reported on the HEAD of this PR is a more accurate representation of our progress towards this goal: We currently include the source code of one.rs, two.rs,... six.rs and so on as covered lines (which add up to ~100 lines, or 0.34% of tracked lines), but they are non-production code. By including them in our progress towards the goal, we are "cheating", in a way.

kalyazin · 2023-11-10T10:44:10Z

I would agree that tests that run the entire binary tool with files as inputs are integration tests by nature and should not be contributing into our unit test coverage metric. In some cases (like this one) unit testing is hard and it is fine to resort to integration tests only while having clear visibility of gaps in unit test coverage (which we are fine with).

wearyzen · 2023-11-10T11:33:06Z

We might not have a written policy but from what I see we concluded with using unit test instead of integration test for code coverage in multiple decision discussions we had internally.

The problem was not with this particular unit test being less efficient and more complex vs the integration test, it was which approach provides better visibility. I didn't see codecov as a proxy but as a tool which provides us a visibility and metric on coverage. Even if integration test in this case might provide us better testing we don't have a metric to know if the testing or coverage was reduced if someone modifies the test. I imagine it would be hard for someone to review all the corner cases if the interaction test was modified.
I am not saying we should let the binaries get into production or our release process, but with moving it to integration test we lose visibility if all cases are covered.

Having said that, I also agree that we can never achieve 100% code coverage and with a good enough explanation as you provided it seems logical to go ahead with this change.

BTW, thanks for being patient with me and explaining this in this detail. I really appreciate that you didn't take the easy path to just remove the commit and fought for the right thing.
And I hope if in future if some PR reduces unit test coverage they put same amount of effort in providing a valid argument for it such as this.

roypat added 3 commits November 9, 2023 08:59

test: Allow get_binary to compile example from different package

0c0c014

Previously, it could only compile examples from the main firecracker package. To compile examples from package "x", we need to pass `-p x`. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>

Turn log-instrument binaries into examples

371de0e

This prevents them from being compiled if only `cargo build` is being run, where they really do not need to be compiled and only increase build time needlessly. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>

roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Nov 9, 2023

JonathanWoollett-Light reviewed Nov 9, 2023

View reviewed changes

Cargo.toml Show resolved Hide resolved

pb8o approved these changes Nov 9, 2023

View reviewed changes

wearyzen suggested changes Nov 9, 2023

View reviewed changes

Merge branch 'main' into more-cargo-run

9169fc1

wearyzen approved these changes Nov 10, 2023

View reviewed changes

Merge branch 'main' into more-cargo-run

bb76879

roypat merged commit abd0b49 into firecracker-microvm:main Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cargo QoL improvements #4228

cargo QoL improvements #4228

Uh oh!

roypat commented Nov 9, 2023

Uh oh!

codecov bot commented Nov 9, 2023 •

edited

Loading

Uh oh!

JonathanWoollett-Light commented Nov 9, 2023 •

edited

Loading

Uh oh!

Uh oh!

roypat commented Nov 9, 2023

Uh oh!

JonathanWoollett-Light commented Nov 9, 2023 •

edited

Loading

Uh oh!

roypat commented Nov 9, 2023

Uh oh!

wearyzen left a comment

Uh oh!

roypat commented Nov 9, 2023 •

edited

Loading

Uh oh!

wearyzen commented Nov 9, 2023

Uh oh!

roypat commented Nov 10, 2023

Uh oh!

kalyazin commented Nov 10, 2023

Uh oh!

wearyzen commented Nov 10, 2023

Uh oh!

Uh oh!

cargo QoL improvements #4228

cargo QoL improvements #4228

Uh oh!

Conversation

roypat commented Nov 9, 2023

Changes

Reason

License Acceptance

PR Checklist

Uh oh!

codecov bot commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JonathanWoollett-Light commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

roypat commented Nov 9, 2023

Uh oh!

JonathanWoollett-Light commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roypat commented Nov 9, 2023

Uh oh!

wearyzen left a comment

Choose a reason for hiding this comment

Uh oh!

roypat commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wearyzen commented Nov 9, 2023

Uh oh!

roypat commented Nov 10, 2023

Uh oh!

kalyazin commented Nov 10, 2023

Uh oh!

wearyzen commented Nov 10, 2023

Uh oh!

Uh oh!

codecov bot commented Nov 9, 2023 •

edited

Loading

JonathanWoollett-Light commented Nov 9, 2023 •

edited

Loading

JonathanWoollett-Light commented Nov 9, 2023 •

edited

Loading

roypat commented Nov 9, 2023 •

edited

Loading