-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ proposal: new API for managed Security Groups #1756
Conversation
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: EmilienM The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc mdbooth |
/cc jichenjc |
f1341cb
to
e2229ff
Compare
namespace: default | ||
spec: | ||
securityGroupsSpec: | ||
controlPlaneSecurityGroupRules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously we talked about whether we should create a CRD and reusable in other clusters
e.g you create multiple clusters and if modify one json file or yaml file then it can be used in multiple clusters?
I know it's complicated so I think we can defer to next stage to add such extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, let's try without a new CRD and see how it goes. I'll keep this comment open as it's still relevant in the last KEP's iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the overall design of this
2. Avoid causing and breaking changes to previous users, they'll still be able to use the legacy rules | ||
3. Successfully be able to create and manage security groups for both the Control Plane and the Worker Nodes | ||
4. Provide a migration path (documentation or code) for users who are currently using the legacy rules provided by `OpenStackCluster.spec.managedSecurityGroups` | ||
5. Deprecate `OpenStackCluster.spec.apiServerLoadBalancer.additionalPorts` as we'll have an API to create Control Plane Security Group rules (with more flexibility than we have now). Conversion will happen between this parameter and the one for Control Plane Security Group rules, in order to maintain backward compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't do this. This parameters has more uses than just security group rules and it would break existing use cases. See #1749 (comment)
However, I would like to separate responsibility for these rules into:
- A separate security group
- Managed by a separate Octavia control plane endpoint provider
i.e. Create a new controller for managing an Octavia API loadbalancer. That controller is responsible for creating a security group suitable for its needs and attaching that security group to control plane machines.
In this case, it would simply be the responsibility of the cluster controller to remove the legacy rules from the default security group.
e2229ff
to
68df77e
Compare
@mkjpryor I'll like your feedback on this one when time permits for you. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an important design decision to be made here - how do we deal with users messing with SGs managed by CAPO through OpenStack API.
My idea would be to adopt K8s model here. The controller should periodically monitor SG rules of the SGs it's managing and SG associations and reconcile them if user made any changes. I think this is the only sane way to do it, otherwise we allow users to modify SGs on their own and we never know if the SG that's attached is to be kept or removed during e.g. upgrade.
# Allow to provide rules for the CNI that will be applied to all nodes | ||
cniSecurityGroupRules: | ||
- direction: ingress | ||
ethertype: IPv4 | ||
portRangeMin: 1234 | ||
portRangeMax: 1234 | ||
protocol: tcp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we call this one CNI, when it's basically a thing gathering rules added to each of the nodes? I'd abstain from designing this against a CNI. From the technical standpoint I think it's possible to use different CNIs on different nodes. It's totally not supported at the moment, but might be one day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can name it allNodesSecurityGroupRules
, I don't mind.
Do you think that this should be consistent with whatever we do for server groups? Personally, I favour new CRDs for both, i.e. I worry about us cluttering up the spec of the For example, a port is something that I don't mind having in |
To me there is a relation between Security Groups and Machines, as we want to update a Machine if a security group is added to the control plane machines. |
I think we all agree; the question remains whether we have a new CRD (+ controller) dedicated to Security Groups lifecycle or continue to use OpenStackCluster. |
It just feels odd to me that all the security groups are defined in the I do agree that it probably makes the conversion more complicated. Happy to be overruled though. |
that's exactly what I asked above, and I Think maybe we can check comments from @EmilienM
so maybe we can check https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1765/files first |
As discussed at the last office hours, I think this is the biggest single API stabilisation task we have for v1beta1. Specifically, we need to remove the hard-coded Calico rules, but allow:
I would very much like to get to v1beta1 asap, so to make that simpler I'd like to propose reducing the scope of the change here as much as possible. managedSecurityGroups:
# Enable the management of security groups with the default rules (kubelet, etcd, etc.)
# The default stays false
enabled: true
# Allow to extend the default security group rules for the Bastion,
# the Control Plane and the Worker security groups
additionalBastionSecurityGroupRules:
- direction: ingress
ethertype: IPv4
portRangeMin: 1234
portRangeMax: 1234
protocol: tcp
additionalControlPlaneSecurityGroupRules:
- direction: ingress
ethertype: IPv4
portRangeMin: 1234
portRangeMax: 1234
protocol: tcp
additionalWorkerSecurityGroupRules:
- direction: ingress
ethertype: IPv4
portRangeMin: 1234
portRangeMax:1234
protocol: tcp
# Allow to provide rules that will be applied to all nodes
allNodesSecurityGroupRules:
- direction: ingress
ethertype: IPv4
portRangeMin: 1234
portRangeMax: 1234
protocol: tcp I would remove managedSecurityGroups:
enabled: false
additionalSecurityGroups:
- ... Does the user expect these to be created when managed security groups is disabled? Secondly, adding a new We discussed having a In the first instance, we can also remove additional controller complexity by not changing the set of security groups attached to nodes. This means we don't have an upgrade problem, and don't need to be able to dynamically attach and detach security groups. Instead, at least in the first instance, we can simply replicate rules in Lastly, we make extensive use of managedSecurityGroups:
enabled: true
allNodesSecurityGroupRules:
- description: BGP (calico)
direction: ingress
ethertype: IPv4
portRangeMin: 179
portRangeMax: 179
protocol: tcp
remoteManagedGroups:
- control-plane
- worker
- description: IP-in-IP (calico)
direction: ingress
ethertype: IPv4
protocol: 4
remoteManagedGroups:
- control-plane
- worker These 2 rules would become 4 rules, as we would create a duplicate for each of the I suggest that this change alone could also be the minimum change, and we could add |
Thank you for that thought through comment! I completely agree! |
Cool, I'm working on a change here: #1826 to address what Matt has asked for. |
I'll just note one thing here - in complicated envs Neutron has performance issues with remote_group_id as it generates a lot of flows. It's advised to use CIDRs directly if possible. This probably isn't a problem for CAPO scale, but I'm mentioning it here for the record. |
Note: this one is not staled, I'll come back to it at some point this year, now that we have a new API for Security Groups. |
41d77ff
to
5722056
Compare
this is still highly WIP, I need to update it based on our recent work. Please don't review it yet. |
5722056
to
3992c24
Compare
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
/close |
@EmilienM: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
KEP to enhance Security Group API.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Related #1752