-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TOCTOU when adding network interfaces to an instance #1222
Conversation
- Fixes a TOCTOU bug, where we checked if an instance is stopped and then issue a separate query to actually create/attach a network interface to it. Done by adding an additional subquery which verifies the instance state in an additional CTE, at both create and delete time. - Adds additional tests for checking the new instance verification - Improves instance integration test coverage by using the endpoint to list interfaces both when the expected list is empty and non-empty
Resolves #1134 |
// NOTE: We need to lookup the VPC and VPC Subnet, since we need both | ||
// IDs for creating the network interface. | ||
// | ||
// TODO-correctness: There are additional races here. The VPC and VPC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it isn't one, it's another, eh?
(Semi-related to this PR, I really want better testing / analysis to help us suss these issues out. DB-related raciness is an area with a lot of potential for unsafety)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it isn't one, it's another, eh?
Indeed :)
I really want better testing / analysis to help us suss these issues out
Yeah, that'd be nice. I'm not sure how to force ourselves to hit this, since the flow would be: fetch the VPC Subnet successfully, delete the VPC Subnet, then try to run this query. How would we do that? Maybe part of a set of stress tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I can think of a view options:
- Random-walk stress testing: ideally, having a "set of operations" and "set of objects/names" that can all be acted upon, and doing some random-walk style operations would be neat.
- Permutation testing: we could try to integrate something like https://github.com/tokio-rs/loom to invasively test the DB-related ops. There would still be reliance on CRDB, however, so these tests would probably not run as a part of the CI check - they might be delegated to "once daily" jobs.
- Model checking: we could attempt to model the DB interactions externally (e.g., through TLA+) and try to prove safety properties about them. I'm skeptical about the cost/benefit of this approach; it seems like a more expensive option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the second approach. The first is probably easiest, but I'd be worried about baking in our assumptions about how things should be used. I agree the formal verification approach is likely to be too heavyweight.
- Reduce const visibility - SQL formatting - Refactor tests for specific behavior
Propolis changes: Add `IntrPin::import_state` and migrate LPC UART pin states (#669) Attempt to set WCE for raw file backends Fix clippy/lint nits for rust 1.77.0 Crucible changes: Correctly (and robustly) count bytes (#1237) test-replay.sh fix name of DTrace script (#1235) BlockReq -> BlockOp (#1234) Simplify `BlockReq` (#1218) DTrace, cmon, cleanup, retry downstairs connections at 10 seconds. (#1231) Remove `MAX_ACTIVE_COUNT` flow control system (#1217) Crucible changes that were in Omicron but not in Propolis before this commit. Return *410 Gone* if volume is inactive (#1232) Update Rust crate opentelemetry to 0.22.0 (#1224) Update Rust crate base64 to 0.22.0 (#1222) Update Rust crate async-recursion to 1.1.0 (#1221) Minor cleanups to extent implementations (#1230) Update Rust crate http to 0.2.12 (#1220) Update Rust crate reedline to 0.30.0 (#1227) Update Rust crate rayon to 1.9.0 (#1226) Update Rust crate nix to 0.28 (#1223) Update Rust crate async-trait to 0.1.78 (#1219) Various buffer optimizations (#1211) Add low-level test for message encoding (#1214) Don't let df failures ruin the buildomat tests (#1213) Activate the NBD server's psuedo file (#1209) --------- Co-authored-by: Alan Hanson <alan@oxide.computer>
then issue a separate query to actually create/attach a network
interface to it. Done by adding an additional subquery which verifies
the instance state in an additional CTE, at both create and delete
time.
list interfaces both when the expected list is empty and non-empty