-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: prefer longer prefixes in IPv6 only environments as default Nod… #9749
base: main
Are you sure you want to change the base?
Conversation
…eAddress Fixes behavior with DHCPv6 available and prefers the assigned address. Signed-off-by: Niklas Voss <niklas.voss@gmail.com>
This is my first PR for Talos and I am not overly familiar with the codebase, but my assumption is changing the default NodeAddress should fix #9725. |
@@ -127,9 +127,17 @@ func (ctrl *NodeAddressController) Run(ctx context.Context, r controller.Runtime | |||
continue | |||
} | |||
|
|||
// in IPv6 only environments we want to prefer longer prefixes, in other environments we keep old behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the proper fix is to re-sort the addresses
list in the first place. Today the order comes from the alphabetical order of address resource IDs which look like <link>/address/bits
so that the IPv6 addresses for the same link are sorted with longest prefix first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way it will not only affect defaultAddress
, but all other list of addresses which are used to pick up etcd, and other listening addresses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not overly familiar with the Talos codebase, is the defaultAddress
not used by etcd as well? Quick glance through the codebase this is also true for kubelet.
Sorting the addresses was my initial thought, but I shied away from that approach as I was uncertain how to handle Dual-Stack environments properly. As a first step I could however change the sorting for IPv6 Single-Stack environments or sort only IPv6 addresses and keep the preference of IPv4 addresses due to sorting by bits.
Would this be the preferred approach?
Another third option, but larger change might be to introduce CEL or expr to allow users to customize filtering and IP selection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me think about this more, but my gut feeling is that sorting by string value (as it is today) is not quite good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have another look this evening and/or tomorrow as I also believe sorting might be the better approach. Grepping the codebase once more I am not certain I even understand what the purpose of defaultAddress is as it is only used in the hostname and etcfile controller (today, might be a relic of the past).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about https://datatracker.ietf.org/doc/html/rfc6724 and the algorithm here, as RFC doesn't directly map (as we try to sort source addresses without ever knowning destination address), so given two addresses, A and B, if one is a prefix of another, then we prefer the longer prefix, but if they don't have a common prefix, I'm not sure if this matters (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so what I'm thinking about - introduce a feature gate (and enable it by default for 1.9) to introduce new sorting algorithm for addresses in general, implement the required changes; simply changing the algorithm might have consequences of Talos choosing different addresses by default which doesn't sound like a good approach.
…y prefix length if one contains the other A new feature flag, enabled by default in 1.9, controls sorting behavior. If a prefix contains the other the longest is preferred. Signed-off-by: Niklas Voss <niklas.voss@gmail.com>
Signed-off-by: Nik Voss <niklas.voss@gmail.com> Signed-off-by: Niklas Voss <niklas.voss@gmail.com>
f142424
to
7dc454c
Compare
Finally found some time to work on resorting IPs. I am planning to improve the test coverage, but before doing that, it would be great to some feedback, whether I am on the right track. (Also feel free to suggest alternative names for the feature.) |
Pull Request
What? (description)
Prefers longer prefixes in IPv6 only environments as default address, but keeps current behaviour everywhere else.
Why? (reasoning)
Talos IP preference strategy does not well in IPv6 Single-Stack environments as RA SLAAC addresses will be preferred over DHCPv6. In environments such as KubeVirt with Passt the SLAAC address is not routable by other Pods. If a DHCPv6 address is available it should be preferred.
The current filtering methods do not work as SLAAC and DHCPv6 address can be from the same prefix or in the case of KubeVirt a non-predictable pod IP.
This change only changes the preference in IPv6 only environments by adapting the default NodeAddress accordingly.
In the future it might be desirable to do a larger rewrite or even introduce CEL or expr to let the user define filtering and expression functions. As this is my first PR to Talos I kept the changes as minimal as possible.
Fixes #9725.
Acceptance
Please use the following checklist:
make conformance
)make fmt
)make lint
)make docs
)make unit-tests
)Unfortunately
make fmt
changed lots of*.proto
files, so I did not include the changes.make unit-tests
did not run (might be a NixOS issue), but running relevant tests withgo test
passed.make docs
did not include relevant updates to the documentation.