Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo QoL improvements #4228

Merged
merged 5 commits into from
Nov 10, 2023
Merged

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Nov 9, 2023

Changes

Turns example binaries in log-instrument into actual examples, and allows us to use cargo run to execute our helper tools

Reason

I couldn't do cargo run --bin cpu-template-helper before + compile time improvements.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet contribution quality standards.

  • This functionality cannot be added in rust-vmm.

Previously, only the `firecracker` binary could be run via cargo run.
Now, we can also do `cargo run --bin cpu-template-helper` for example. I
am keeping the jailer excluded from the `default-members` list, as this
seems to have been an intentional choice made in
3bf285c.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Previously, it could only compile examples from the main firecracker
package. To compile examples from package "x", we need to pass `-p x`.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
This prevents them from being compiled if only `cargo build` is being
run, where they really do not need to be compiled and only increase
build time needlessly.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Copy link

codecov bot commented Nov 9, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (44dfc0c) 82.21% compared to head (bb76879) 81.75%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4228      +/-   ##
==========================================
- Coverage   82.21%   81.75%   -0.47%     
==========================================
  Files         236      236              
  Lines       29193    29187       -6     
==========================================
- Hits        24001    23861     -140     
- Misses       5192     5326     +134     
Flag Coverage Δ
4.14-c7g.metal 77.19% <ø> (-0.55%) ⬇️
4.14-m5d.metal 79.05% <ø> (-0.52%) ⬇️
4.14-m6a.metal 78.18% <ø> (-0.53%) ⬇️
4.14-m6g.metal 77.19% <ø> (-0.55%) ⬇️
4.14-m6i.metal 79.05% <ø> (-0.51%) ⬇️
5.10-c7g.metal 80.08% <ø> (-0.55%) ⬇️
5.10-m5d.metal 81.71% <ø> (-0.51%) ⬇️
5.10-m6a.metal 80.93% <ø> (-0.53%) ⬇️
5.10-m6g.metal 80.08% <ø> (-0.55%) ⬇️
5.10-m6i.metal 81.71% <ø> (-0.51%) ⬇️
6.1-c7g.metal 80.08% <ø> (-0.55%) ⬇️
6.1-m5d.metal 81.72% <ø> (-0.50%) ⬇️
6.1-m6a.metal 80.93% <ø> (-0.53%) ⬇️
6.1-m6g.metal 80.08% <ø> (-0.55%) ⬇️
6.1-m6i.metal 81.71% <ø> (-0.51%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/log-instrument/examples/five.rs 0.00% <ø> (ø)
src/log-instrument/examples/four.rs 0.00% <ø> (ø)
src/log-instrument/examples/one.rs 0.00% <ø> (ø)
src/log-instrument/examples/six.rs 0.00% <ø> (ø)
src/log-instrument/examples/three.rs 0.00% <ø> (ø)
src/log-instrument/examples/two.rs 0.00% <ø> (ø)

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Nov 9, 2023
@JonathanWoollett-Light
Copy link
Contributor

JonathanWoollett-Light commented Nov 9, 2023

This has -96.88% coverage on src/log-instrument/src/lib.rs. Should we care about coverage here?

@roypat
Copy link
Contributor Author

roypat commented Nov 9, 2023

This has -96.88% coverage on src/log-instrument/src/lib.rs. Should we care about coverage here?

I dont think this matters much, the tests are still there after all

@JonathanWoollett-Light
Copy link
Contributor

JonathanWoollett-Light commented Nov 9, 2023

This has -96.88% coverage on src/log-instrument/src/lib.rs. Should we care about coverage here?

I dont think this matters much, the tests are still there after all

It seems this is always how we approach this. If we functionally ignore it by never taking action as a result of the metric. Why do we track it? Maybe we should remove it if it is just noise?

This PR can act as a good example, we should clarify the story here.

@roypat
Copy link
Contributor Author

roypat commented Nov 9, 2023

This has -96.88% coverage on src/log-instrument/src/lib.rs. Should we care about coverage here?

I dont think this matters much, the tests are still there after all

It seems this is always how we approach this. If we functionally ignore it by never taking action as a result of the metric. Why do we track it? Maybe we should remove it if it is just noise?

This PR can act as a good example, we should clarify the story here.

To me, we track code coverage to identify gaps in our testing. E.g. a low coverage percentage can be a hint that some PR is missing some unit tests, or the annotations that codecov adds to the PR diff shows some uncovered lines which then leads us to add more testing (this is how I discovered the initrd issues in #4073).

I agree that looking at the metric in isolation is meaningless. To me its a case-by-case decision on every PR. In this specific case, the coverage drops, but there is no gap in testing, so I ignore it.

Copy link
Contributor

@wearyzen wearyzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the files were designed to be tests so changing them to examples doesn't add much value to it.
If improving the build time is the only reason why the tests were turned to examples then, could we add the details on by how much did the build improve? I don't mind an additional build time of few minutes if codecov can show me a 100%coverage.

@roypat
Copy link
Contributor Author

roypat commented Nov 9, 2023

I think the files were designed to be tests so changing them to examples doesn't add much value to it. If improving the build time is the only reason why the tests were turned to examples then, could we add the details on by how much did the build improve? I don't mind an additional build time of few minutes if codecov can show me a 100%coverage.

The compile time impact to tools/devtool build --release is 1m 45s for main vs 1m 27s on this PR branch (so we see an improvement of ~14%).

But independently of that, I feel fairly strongly about not exposing testing details as binary targets. Currently for example ./tools/devtool build --release will point the user to the release directory, and if they look there, they see six very confusing binaries just named "one" through "six" (in addition to the expected firecracker binaries). This is just a single bug away from us accidentally uploading them as official release artifacts, too. They probably also show up in various other cargo outputs (for example, if we go with Jonathan's suggestions to remove default-members altogether from Cargo.toml, they will show up in the output of cargo run --bin when it lists possible binaries to run). This really shouldnt be the case.

@wearyzen
Copy link
Contributor

wearyzen commented Nov 9, 2023

I think the files were designed to be tests so changing them to examples doesn't add much value to it. If improving the build time is the only reason why the tests were turned to examples then, could we add the details on by how much did the build improve? I don't mind an additional build time of few minutes if codecov can show me a 100%coverage.

The compile time impact to tools/devtool build --release is 1m 45s for main vs 1m 27s on this PR branch (so we see an improvement of ~14%).

But independently of that, I feel fairly strongly about not exposing testing details as binary targets. Currently for example ./tools/devtool build --release will point the user to the release directory, and if they look there, they see six very confusing binaries just named "one" through "six" (in addition to the expected firecracker binaries). This is just a single bug away from us accidentally uploading them as official release artifacts, too. They probably also show up in various other cargo outputs (for example, if we go with Jonathan's suggestions to remove default-members altogether from Cargo.toml, they will show up in the output of cargo run --bin when it lists possible binaries to run). This really shouldnt be the case.

The build improvement is ~20secs which is too small to justify removing the tests however, I understand why having those weird binaries could cause a confusion and this concern was also raised when they were first added.
If we want to remove the test bins I would suggest moving them to better place instead of completely removing them.
There is an alternative solution proposed here.

I agree that the bins could cause confusion where they are right now but, replacing them with a python integration test goes against our code coverage policy. I suggest we create an issue to do the right thing (moving to correct place while maintaining coverage) but until then I don't think we should remove them.

@roypat
Copy link
Contributor Author

roypat commented Nov 10, 2023

The build improvement is ~20secs which is too small to justify removing the tests however, I understand why having those weird binaries could cause a confusion and this concern was also raised when they were first added. If we want to remove the test bins I would suggest moving them to better place instead of completely removing them. There is an alternative solution proposed here.

I agree that the bins could cause confusion where they are right now but, replacing them with a python integration test goes against our code coverage policy. I suggest we create an issue to do the right thing (moving to correct place while maintaining coverage) but until then I don't think we should remove them.

I want to stress that no tests are being removed in this PR. We are still exercising exactly the same code paths and checking exactly the same assertions. Here, we have a case of code being very hard to unit tests (we could mock the logger, which would also solve this problem, but even then we cannot properly check any assertions because the log messages contain thread ids, and if the logs arent generated by a separate process, these become non-deterministic). Whenever code is very hard to unit tests, and workarounds introduce drawbacks (such as accidentally vending dummy binaries to users, or adding significant complexity to the testing code, as Jonathan mentions his alternative would do), we should use integration testing for it. It is also a matter of consistency with the rest of our testing framework: UFFD and seccomp tests are cargo example that are driven by python integration tests (and when we did this, even promoting them to examples from crates hidden inside of our testing framework went through a lot of discussion because of the same point: It puts testing code into a more prominent place alongside production code). And we have also in the past decided that we value sensibility of our testing over the code coverage number, when we decided to remove "coverage generating" tests for debug implementations, because they added no functional value. Because the code coverage number is a proxy for the property that we actually care about: Code being tested. And if we only focus on the proxy and try to game the number, we lose track of that goal, and that is when it becomes useless.

[...] goes against our code coverage policy.

I am not aware of us having any policy regarding code coverage (I know we decided against introducing static checks in our CI for it, because it was impossible to meaningfully phrase failure conditions that do not rule our valid usecases). We only have a team-set goal to reach 85% coverage, but even here I could argue that the 81.77% reported on the HEAD of this PR is a more accurate representation of our progress towards this goal: We currently include the source code of one.rs, two.rs,... six.rs and so on as covered lines (which add up to ~100 lines, or 0.34% of tracked lines), but they are non-production code. By including them in our progress towards the goal, we are "cheating", in a way.

@kalyazin
Copy link
Contributor

I would agree that tests that run the entire binary tool with files as inputs are integration tests by nature and should not be contributing into our unit test coverage metric. In some cases (like this one) unit testing is hard and it is fine to resort to integration tests only while having clear visibility of gaps in unit test coverage (which we are fine with).

@wearyzen
Copy link
Contributor

We might not have a written policy but from what I see we concluded with using unit test instead of integration test for code coverage in multiple decision discussions we had internally.

The problem was not with this particular unit test being less efficient and more complex vs the integration test, it was which approach provides better visibility. I didn't see codecov as a proxy but as a tool which provides us a visibility and metric on coverage. Even if integration test in this case might provide us better testing we don't have a metric to know if the testing or coverage was reduced if someone modifies the test. I imagine it would be hard for someone to review all the corner cases if the interaction test was modified.
I am not saying we should let the binaries get into production or our release process, but with moving it to integration test we lose visibility if all cases are covered.

Having said that, I also agree that we can never achieve 100% code coverage and with a good enough explanation as you provided it seems logical to go ahead with this change.

BTW, thanks for being patient with me and explaining this in this detail. I really appreciate that you didn't take the easy path to just remove the commit and fought for the right thing.
And I hope if in future if some PR reduces unit test coverage they put same amount of effort in providing a valid argument for it such as this.

@roypat roypat merged commit abd0b49 into firecracker-microvm:main Nov 10, 2023
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants