-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port/Trunk Re-use Logic Bugs #834
Comments
@iamemilio Just out of curiosity, what other services could take the ports created by CAPO? What would be the correct way to implement this to avoid all these problems? |
It might be the case that the user is deploying on a shared openstack tenant, and another service takes a port that CAPO was planning to attach to an instance. I think this is a relatively unlikely case though, and might be resolvable with just a simple reconcile loop. I think its much more likely that CAPO would mistakenly take a port that was set up for infrastructure purposes in a given network. For example, we create ports in the same subnet as our master and worker nodes that are assigned VIPs in the OpenShift platform. While OpenStack views these ports as being in "DOWN" state, they are actually in use. This is a very tricky situation to resolve. In my opinion, there are 2 ways to go about this: The Easy Way: CAPO always creates and destroys the ports it consumes for interfaces.
Cost:
Example implementation in OpenShifts v1alpha1 fork: openshift#175 The Hard Way: we figure out all the nuances to how to safely re-use ports in a way that supports all users
Costs:
|
Okay I think I mostly got it. And those ports are created with the same name as the server?
If I understand our code correctly, we are creating / listing the ports with the same name as the server instances. I'm just asking, because if we cannot even be reasonably sure that ports with the server names have been created by us I'm not sure if we can 100% avoid leaking ports without storing all resources we create somewhere (similar like terrafrom does it with a its state). Even then we're not 100% safe as the controller can theoretically be killed between OpenStack resource creation and before it stored the state (e.g. in a Kubernetes CRD). So right now the only way to connect OpenStack resources to it's counterparts in Kubernetes/CAPO are their names. I think we have a huge problem if we cannot rely on that. |
hmm, yeah I overlooked that to be honest 😃. I see where you are coming from, and think that its probably ok to continue listing the ports the way that we have been. That being said, I think we should be defensive about which ports we select to use and check:
Then we could check or apply an update to that port to make sure that it meets the user's requirements. |
@iamemilio Sounds good to me :) |
What is the value of having this port re-use logic, out of curiosity? |
Compared to delete and create the port? I'm happy as long as we're not leaking ports. |
@sbueringer @iamemilio I'd be interested in seeing what could be done to both tighten up the port re-use logic, and potentially also make it more useful. A few comments on the discussion so far:
This is fixed with #876. Now multiple NICs on the same network will have different indices, hence different port names and don't collide. The outstanding issues are:
We could check these conditions in code where the port name is matched.
Could we copy the 'filter' pattern used for networks and subnets and ensure that if we re-use a port, it matches at least what is specified in the filter?
This is interesting, I think port creation and port attachment will never be atomic in OpenStack, so I think as @iamemilio suggests, a retry in the reconcile loop would make sense.
I'd actually like to have the option to use a port that already exists for my CAPO instance, so I'm glad we have The biggest concern here seems to be with leaking ports. One idea is a Let me know if you'd want a more thorough design document detailing these ideas. |
@macaptain Sounds good to me, although I don't have these use cases so it's not that relevant for my use cases how exactly we implement this :) I think it's important to reach consensus with the OpenShift folks. Let's see what @iamemilio thinks about it. |
Hey @iamemilio, what do you think about the port re-use suggestions above? (#834 (comment)) |
Sorry for the long delay in getting back to you. I like the idea and would be on board with that. |
I think it would be valuable to draft a design for this, since it would be a fairly large set of features that could impact the user experience. I would be happy to help out if you want. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
I am still thinking over this ticket and what we can do about port-reuse. /assign |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
I think this is still a valid issue, although lots of work on the ports have been done. I'd support a simpler solution than I proposed above. We could handle the error cases outlined in the issue above better when we get an existing port, but I haven't made progress on this and won't in the near future. /unassign |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
The port and trunk re-use logic in the instance create function of CAPO is buggy in all versions, but even if it was not, it would be a race condition.
Example 1: ports created for instances
Bugs:
Race:
The same points apply for trunk ports and the upcoming port logic
The text was updated successfully, but these errors were encountered: