-
Notifications
You must be signed in to change notification settings - Fork 73
OCPBUGS-18641: Set dual-stack IPFamilyPriority for vSphere #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-18641: Set dual-stack IPFamilyPriority for vSphere #279
Conversation
|
Skipping CI for Draft Pull Request. |
|
@mkowalski: This pull request references Jira Issue OCPBUGS-18641, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test all |
72ffc4b to
013459b
Compare
|
@mkowalski: This pull request references Jira Issue OCPBUGS-18641, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test all |
013459b to
ec74c8c
Compare
|
/test all |
ec74c8c to
fd8ee8c
Compare
|
/test e2e-vsphere-ovn It's sig-storage that fails, let's just retest |
| - name: ENABLE_ALPHA_DUAL_STACK | ||
| value: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this forcing on an alpha feature gate for all vSphere clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really a feature gate, at least not in the sense how we implemented CloudDualStackNodeIPs in https://kubernetes.io/docs/concepts/services-networking/dual-stack/
This is some VMware internal(?) code that requires an existence of env variable (without even looking at its value), look at https://github.com/openshift/cloud-provider-vsphere/blob/master/pkg/cloudprovider/vsphere/cloud.go#L253-L255 to see how it's handled. Not really the way how every other component handles CloudDualStackNodeIPs (even the name does not match).
From the moment I've seen it I got confused about how we should handle this. The only thing that made me feel safe is that this validator I have linked is the only place in the whole codebase where this is used.
Any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code, it seems relatively minor, I guess we can leave it as is
| if net.IsIPv4CIDRString(network.Spec.ServiceNetwork[0]) { | ||
| cfg.Global.IPFamilyPriority = []string{"ipv4", "ipv6"} | ||
| } else { | ||
| cfg.Global.IPFamilyPriority = []string{"ipv6", "ipv4"} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a possibility that both entries in the list are ipv4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, having ["ipv4,"ipv4"] would behave exactly the same as ["ipv4"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was more, ServiceNetwork could be two IPv4 addresses right? Why aren't we checking both entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's not allowed. If you have 2 service networks, then one has to be v4 and the other one v6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact well, let me confirm this with o/installer... It may have changed and the validations are way more complex now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/openshift/installer/blob/master/pkg/types/validation/installconfig.go#L310-L313 <-- so if you are installing single-stack v4 cluster, you may have only 1 service network entry
https://github.com/openshift/installer/blob/master/pkg/types/validation/installconfig.go#L252-L254 <-- if you are installing dual-stack cluster, you must have exactly 2 entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, can you just add a comment inline to explain that if there are two service networks, the installer already validated that one is ipv4 and the other is ipv6, so we don't lose this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, done
|
/hold CI failures indicate that the CSI driver is probably not ok with this change, @mkowalski can you please take a look |
|
As discussed, seems like wider failure what makes me think it's not my code what introduced it (luckily this time) Ref.: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1695228004392599 |
|
/test e2e-vsphere-ovn |
|
@mkowalski: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
| } | ||
| } | ||
|
|
||
| nodeNetworking.External.ExcludeNetworkSubnetCIDR = append(nodeNetworking.External.ExcludeNetworkSubnetCIDR, "fd69::2/128") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a named constant for fd69::2/128 so we can crossref with ovn-kubernetes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd totally agree if we had a common/shared library that ovn-k8s would consume. In the state like it is now, I'd say address is an address and it's not a big diff if there is variable or not (especially that I use it only in a single place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we avoid magic numbers? and it is used twice. but maybe we add a comment in OVN-k8s as well.
can customer override V6HostMasqueradeIP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it's configurable ovn-kubernetes/ovn-kubernetes#3594
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, we can pull from the API? openshift/api#1410
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking into this. It's not nice because they didn't add it as Network CRD but only in the Operator Config CRD so I'm not sure if I can access it easily from inside the 3cmo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify - until openshift/cluster-network-operator#1807 merges, this value is not configurable. I am now discussing with OVN-K8s folks the rationale, but at this moment this value is the only one that we will see in the wild.
Having said this, I wonder if we can push this change as it is now and then when OVN-K8s folks are ready with openshift/cluster-network-operator#1807 (hopefully with revisited API that stores the custom subnet in the Network CR) improve it for 4.15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I agree we should merge this to unblock dual-stack.
|
openshift/kubernetes#1714 is already in the merge pool. Hopefully we get a nightly today and by tomorrow this one here is green |
|
failed, due to lack of nodes retrying. |
|
We had a successful cluster from this PR back in September 19th, https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-vsphere-modern/1704172710533271552 |
|
I had successful cluster yesterday, so maybe quota or something. |
|
/test e2e-vsphere-ovn |
|
/hold cancel |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@mkowalski: Jira Issue OCPBUGS-18641: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged:
These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-18641 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@mkowalski: Jira Issue OCPBUGS-18641: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged:
These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-18641 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@mkowalski: Jira Issue OCPBUGS-18641: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged:
These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-18641 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@mkowalski: Jira Issue OCPBUGS-18641: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-18641 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherry-pick release-4.14 |
|
@mkowalski: new pull request created: #283 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We have discovered that in dual-stack setups NodeAddresses field of
the instance metadata contains only IPv4 addresses for VMs that do have
both IPv4 and IPv6 addresses assigned (and detected by the VM agent).
It has been traced back to the function responsible for populating this
metadata field. We found out that for our configuration we always filter
only IPv4 addresses even if running in dual-stack. Reason for that is
that
IPFamilyPriorityhas a value ofipv4even when running in adual-stack setup.
This causes an issue because this instance metadata is cross-checked
with addresses provided by the kubelet as part of the
alpha.kubernetes.io/provided-node-ipannotation. Without correct valueof
IPFamilyPrioritywe are thus removing all the IPv6 addresses.This PR takes advantage of the Service Networks configured by the user in
install-config and the fact that we only allow 2 networks to be configured
if the setup is dual-stack.
Fixes: OCPBUGS-18641