Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

endpoints for firewall rules #623

Open
jessfraz opened this issue Jan 23, 2022 · 9 comments
Open

endpoints for firewall rules #623

jessfraz opened this issue Jan 23, 2022 · 9 comments
Labels
api Related to the API.
Milestone

Comments

@jessfraz
Copy link
Contributor

  • there is a put but it seems to imply it updates all the rules not one?
  • there is no post
  • there is no delete
@jessfraz jessfraz added the api Related to the API. label Jan 23, 2022
@rmustacc
Copy link

Hey @jessfraz, at least from an API perspective our initial view of firewall rules was as one larger object. This was done mostly from the perspective of allowing atomic updates on the atomic rule sets, that's why there isn't a common POST/DELETE. The goal here was to allow the use of etgas in a way that you couldn't add/delete a rule without context of the surrounding ones. Phrased differently the current firewall as a whole is a single object.

If useful, I can dig up a bunch of the past discussions on this for context.

@jessfraz
Copy link
Contributor Author

jessfraz commented Jan 24, 2022 via email

@jessfraz
Copy link
Contributor Author

so for example someone using the api that just wants to add one rule:

they need to get them all first then add their new one, so if I make a cli command to add a rule I need to get them all first then add it, why wouldn't we just make the endpoints so its not the worst

@rmustacc
Copy link

Just for context, the array of rules in one endpoint is how AWS and GCP work. We had a discussion around this for routes and firewalls in this control plane huddle. The tradeoff with the approach you're describing is the inability to perform an atomic update of the rule set. I get that from some ergonomics in the client, that makes the above worse. But aren't most updates going to be doing conditional PUTs to make sure resources don't change out from under them in general, which requires fetching the resource to begin with?

Nothing is set in stone, but the goal was to try and address a class of issues some of us have encountered in the past -- not to create the worst API ever. Please give us the benefit of the doubt. Obviously without having end-to-end working consumers nothing we've done is perfect and we're going to learn a lot more from this. Thanks for raising this.

@jessfraz
Copy link
Contributor Author

Seems like you can still add a single rule tho (at least on GCP):

https://cloud.google.com/compute/docs/reference/rest/v1/firewallPolicies/addRule

@jessfraz jessfraz added the mvp label Jan 24, 2022
@davepacheco
Copy link
Collaborator

they need to get them all first then add their new one, so if I make a cli command to add a rule I need to get them all first then add it, why wouldn't we just make the endpoints so its not the worst

If the CLI doesn't get them all first, is it possible to avoid the dueling administrator problem? That seems somewhat high stakes here, since it would seem pretty easy for an unexpected combination of rules to disrupt connectivity and trigger a major incident for the customer.

Is there some scalability problem you're worried about or is it just the code required to fetch the rules first?

@jessfraz
Copy link
Contributor Author

jessfraz commented Jan 24, 2022

Its annoying to have to fetch them all first, as someone who usually just adds a rule or two, I like that gcloud has an API for adding one, which then allows them to have a cli command for adding one. But if the cli is just fetching them all and adding one, why wouldn't we just make an endpoint to do that exact same thing. Also you could say the same thing you are saying about API endpoint couldn't you? Like deploying a VM, how do you prevent two from being created at the same time, I just don't really understand the problem you are solving with this but maybe I am obtuse

@davepacheco
Copy link
Collaborator

Its annoying to have to fetch them all first, as someone who usually just adds a rule or two, I like that gcloud has an API for adding one, which then allows them to have a cli command for adding one.

I get that there's a potential cost to adoption if we're expecting customers to build directly atop our API and they have to write extra code for what should be an easy operation. I don't want to minimize that. But I do want to be clear about the problem: I think we're just talking about the extra code required to build a CLI add-rule command without an API that just does it for you. This doesn't affect the user of the CLI as far as I understand.

Also you could say the same thing you are saying about API endpoint couldn't you? Like deploying a VM, how do you prevent two from being created at the same time, I just don't really understand the problem you are solving with this but maybe I am obtuse

Yes, the same concern applies to most non-create endpoints. But a user can avoid clobbering somebody else's changes by using HTTP conditional requests with an ETag (once that's implemented). That doesn't work here because if you have the ETag, then you already fetched the rules, so you wouldn't need the "addRule" API. (You specifically asked about "create", which is different. The way we're doing creates with POST, it's hard to have a conditional request. That said, I don't think it's common for two create operations to interfere with each other in quite the same way, where both requests succeed but the resulting system is broken in a way that neither administrator could have anticipated.)

I'm not arguing that we shouldn't have an add-rule API because of the dueling administrators problem. Rather, I'd argue that if we're only going to build one API, it should be one that allows users to deal with this problem if they want to because my experience is that it's essential in order to build robust automation atop an API. Put differently: a customer can always build add-rule on top of what we're doing, even if it's a bit of work. They cannot build a firewall modification mechanism atop add-rule/remove-rule that's robust to concurrent changes, which means we're forcing those users to build some lock outside of our system.

If we accept that we need the endpoint that lets users address this problem, then I'd also argue for API minimalism: if something can be done on the client with no cost in terms of correctness, flexibility, scalability, or security, let's do that rather than add more API surface area. This might be at odds with adoption considerations and maybe that's a reason to build convenience APIs.

@jessfraz
Copy link
Contributor Author

so then why not do both, keep the endpoints you already have and add the endpoints for single rule adding for the lazy users. not everyone has n-teen hours in a day for just writing bits of code to account for our API not having said features, we should make our API usable by anyone out the box

@smklein smklein added this to the MVP milestone Jan 20, 2023
@askfongjojo askfongjojo modified the milestones: MVP, Unscheduled Apr 6, 2023
@askfongjojo askfongjojo removed the mvp label Apr 11, 2023
leftwo pushed a commit that referenced this issue Feb 9, 2024
Crucible changes:
Remove unused fields in IOop (#1149)
New downstairs clone subcommand. (#1129)
Simplify the do_work_task loop (#1150)
Move `Guest` stuff into a module (#1125)
Bump nix to 0.27.1 and use new safer Fd APIs (#1110)
Move `FramedWrite` work to a separate task (#1145)
Use fewer borrows in ExtentInner API (#1147)
Update Rust crate reedline to 0.28.0 (#1141)
Update Rust crate tokio to 1.36 (#1143)
Update Rust crate slog-bunyan to 2.5.0 (#1139)
Update Rust crate rayon to 1.8.1 (#1138)
Update Rust crate itertools to 0.12.1 (#1137)
Update Rust crate byte-unit to 5.1.4 (#1136)
Update Rust crate base64 to 0.21.7 (#1135)
Update Rust crate async-trait to 0.1.77 (#1134)
Discard deferred msgs (#1131)
Minor Downstairs cleanup (#1127)
Update test_fail_live_repair to support pstop (#1128)
Ignore client messages after stopping the IO task (#1126)
Move client IO task into a struct (#1124)
Bump Rust to 1.75 and fix new Clippy lints (#1123)

Propolis changes:
PHD: convert to async (#633)
PHD: assume specialized Windows images (#636)
propolis-standalone-config needn't be a crate
standalone: Use tar for snapshot/restore
phd: use latest "lab-2.0-opte" target, not a specific version (#637)
PHD: add tests for migration of running processes (#623)
PHD: fix `cargo xtask phd` tidy not doing anything (#630)
PHD: add documentation for `cargo xtask phd` (#629)
standalone: improve virtual device creation errors (#632)
phd: add Windows Server 2019 guest adapter (#627)
PHD: add `cargo xtask phd` to make using PHD nicer (#619)
leftwo added a commit that referenced this issue Feb 9, 2024
Crucible changes:
Remove unused fields in IOop (#1149)
New downstairs clone subcommand. (#1129)
Simplify the do_work_task loop (#1150)
Move `Guest` stuff into a module (#1125)
Bump nix to 0.27.1 and use new safer Fd APIs (#1110) Move `FramedWrite`
work to a separate task (#1145) Use fewer borrows in ExtentInner API
(#1147)
Update Rust crate reedline to 0.28.0 (#1141)
Update Rust crate tokio to 1.36 (#1143)
Update Rust crate slog-bunyan to 2.5.0 (#1139)
Update Rust crate rayon to 1.8.1 (#1138)
Update Rust crate itertools to 0.12.1 (#1137)
Update Rust crate byte-unit to 5.1.4 (#1136)
Update Rust crate base64 to 0.21.7 (#1135)
Update Rust crate async-trait to 0.1.77 (#1134)
Discard deferred msgs (#1131)
Minor Downstairs cleanup (#1127)
Update test_fail_live_repair to support pstop (#1128) Ignore client
messages after stopping the IO task (#1126) Move client IO task into a
struct (#1124)
Bump Rust to 1.75 and fix new Clippy lints (#1123)

Propolis changes:
PHD: convert to async (#633)
PHD: assume specialized Windows images (#636)
propolis-standalone-config needn't be a crate
standalone: Use tar for snapshot/restore
phd: use latest "lab-2.0-opte" target, not a specific version (#637)
PHD: add tests for migration of running processes (#623) PHD: fix `cargo
xtask phd` tidy not doing anything (#630) PHD: add documentation for
`cargo xtask phd` (#629) standalone: improve virtual device creation
errors (#632) phd: add Windows Server 2019 guest adapter (#627)
PHD: add `cargo xtask phd` to make using PHD nicer (#619)

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the API.
Projects
None yet
Development

No branches or pull requests

5 participants