Skip to content

Prefix on nicv6 support #3658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

NihaNallappagari
Copy link

@NihaNallappagari NihaNallappagari commented May 13, 2025

Reason for Change:
Current Problem:
VNET currently limits scaling to 64K IP addresses per VNET due to per-IP route mappings. To support larger AKS clusters, the platform is introducing per-prefix (CIDR) route mapping, allowing a single mapping to represent multiple IPs (e.g., /24 enables 256 IPs per mapping). This enables scaling up to 16 million IPs per VNET without impacting the underlying platform.

Change needed:
Prefix on NIC v4 is supported in Swiftv1, but Swiftv1 does not support IPv6. This change upgrades Prefix on NIC v4 functionality to Swiftv2 and introduces IPv6 support in Swiftv2.

Changes
This PR has changes specific to Prefix on NICv6
CNS
--Consuming PrimaryIPv6, GatewayIPv6, MacAddress (DelegatedNIC) from NNC CRD because of dualstack NC
--IP allocation: Assign IPs of each IPFamily as part of RequestIPs api request (Currently when a pod is created, IPAM RequestsIPs from CNS where CNS picks one IP from each NC and hands it over to IPAM.

IPAM
--Change RequestIPs response parsing to read GatewayIPv6 and MacAddress
--Populates Interfaces with MacAddress which is used by CNI to plumb routes to send traffic

Design doc for pocv6 with sample NNC
Requirements:

Notes:

@Copilot Copilot AI review requested due to automatic review settings May 13, 2025 22:54
@NihaNallappagari NihaNallappagari requested review from a team as code owners May 13, 2025 22:54
@NihaNallappagari NihaNallappagari requested a review from csfmomo May 13, 2025 22:54
Copilot

This comment was marked as outdated.

@NihaNallappagari
Copy link
Author

NihaNallappagari commented May 13, 2025

@microsoft-github-policy-service agree company="Microsoft"

if gatewayIP != nil {
gatewaysIPs[i] = gatewayIP
} else {
logger.Printf("gatewayIP is nil or gateway ip parse failed for pod ip %s", resp.PodIPInfo[i].PodIPConfig.IPAddress)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return error if gwip is nil?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated code to return error message

secondaryIPConfigs[addr.String()] = cns.SecondaryIPConfig{
IPAddress: addr.String(),
NCVersion: int(nc.Version),
// in the case of vnet prefix on swift v2 the primary IP is a /32 and should not be added to secondary IP configs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kadevu is primary ip /32 only on swiftv2 and not on swiftv1? I knew we had this discussion, sorry i forgot the reason behind this check

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kadevu is having issues to comment on the PR. His response "yes this is only for Swiftv2. This is basically the primary ip to assign the to the delegated nic and not assigned to any pod which is in line with the InterfaceCompartment published to pubsub. In case of Swiftv1, this field has /28 instead and all those ips are assigned to the pods"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are not assigning any ip to delegated nic, why it has to be /32 and not /28 as like swiftv1?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the interfacecompartment schema enforces customeripaddress to be /32 and customeripprefix to be /28. For the customeripadderss we carve out a /28 and then assign a /32 from it. The rest of the IPs in the /28 are not programmed by nmagent which is why we cannot use these for the pods. To make sure there is no wastage of ips we use the other ips from the /28 from which we got the /32 to assign customeripaddress for other nodes in the nodepool.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the if check for single ip. We have some discussions going with Control plane SMEs. I will do a follow-up to change this

ncStatus := service.state.ContainerStatus[idx]
if mac := ncStatus.CreateNetworkContainerRequest.NetworkInterfaceInfo.MACAddress; mac != "" {
logger.Printf("NC %s has MAC address for prefix on nic swiftv2 scenario, skipping syncHostNCVersion", ncStatus.ID)
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we detect swiftv2 based on mac? why not add a config field in cns to specify swiftv2. i guess there is one for singularity which we can reuse

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. This is specific to swiftv2 NIC, we can make use of config fields. Is this the file you are talking about cns config. If so, I don't find a field specific to singularity. Could you share the field you are talking about? Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about this one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the mac address check. Once the NMAgent exposes NC details, I will make the changes in a new PR

@tamilmani1989
Copy link
Member

@rbtr can you review this as well since it touches cns changes for aks swift too

@rbtr rbtr requested a review from Copilot June 6, 2025 15:56
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces changes to support dual-stack NIC functionality for Prefix on NICv6 scenarios by extending both CNS and IPAM components. Key updates include:

  • CNS: Populating the MACAddress in podIPInfo and skipping NC version sync when a MAC address is present.
  • IPAM: Adjusting IP allocation logic by mapping and handling multiple IP families and updating error messages accordingly.
  • Additional updates in tests, internal API, conversion, network container contract, and Azure IPAM integration to support IPv6 gateway addresses and related deduplication.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
cns/restserver/util.go Added MACAddress population to podIPInfo
cns/restserver/ipam.go Adjusted IP assignment logic for dual-stack handling and introduced generateAssignedIPKey
cns/restserver/internalapi_test.go Added test to verify skipping NC version sync for prefix on NIC swift v2
cns/restserver/internalapi_linux.go Added IPv4 check in SNAT rule programming
cns/restserver/internalapi.go Skips sync for prefix on NIC when MAC address is set
cns/kubecontroller/nodenetworkconfig/conversion_linux.go Updated static NC request conversion to handle single IP prefix cases properly
cns/NetworkContainerContract.go Added IPv6 gateway field and IPFamily enum definition
azure-ipam/ipconfig/ipconfig.go Updated response processing to include IPv6 gateway parsing
azure-ipam/ipam.go Modified CmdAdd to integrate IPv6 gateway assignment and deduplicate interfaces
Comments suppressed due to low confidence (2)

cns/restserver/ipam.go:1029

  • Consider adding an inline comment to clarify why the slice size for podIPInfo is set to the number of IP families, emphasizing how dual-stack handling determines the number of IPs to assign.
numberOfIPs = numOfIPFamilies

azure-ipam/ipam.go:150

  • [nitpick] Consider adding a comment explaining that the deduplication of interfaces by MAC address is deliberate to avoid duplicate interface entries when multiple IP family entries point to the same network interface.
if podIPInfo.MacAddress == "" || seenInterfaces[podIPInfo.MacAddress] {

Copy link
Contributor

@rbtr rbtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there AzCNI changes I'm missing here or is this only for Cilium?

ncStatus := service.state.ContainerStatus[idx]
if mac := ncStatus.CreateNetworkContainerRequest.NetworkInterfaceInfo.MACAddress; mac != "" {
logger.Printf("NC %s has MAC address for prefix on nic swiftv2 scenario, skipping syncHostNCVersion", ncStatus.ID)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about this one?

Comment on lines 1012 to 1022
ip := net.ParseIP(secIPConfig.IPAddress)
if ip == nil {
continue
}

if ip.To4() != nil {
ncIPFamilies[cns.IPv4] = struct{}{}
} else {
ncIPFamilies[cns.IPv6] = struct{}{}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use netip.ParseAddr to get a netip.Addr, then you can call Addr#Is4 or Addr#Is6 instead. prefer netip to net package in new code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes mentioned. Thanks

Address PR comments

remove unnecessary logger

update synhost skip test scenario

fix linting error

updated lint error

fix linting error

.

update synhost skip test scenario

fix linting error

updated lint error

fix linting error

.

gatewayIP := net.ParseIP(gatewayStr)
if gatewayIP == nil {
return nil, nil, errors.Errorf("failed to parse gateway IP %q for pod ip %s", gatewayStr, podInfo.PodIPConfig.IPAddress)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if nil, why we log gatewayStr. will it cause any panic? did we add UT for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gateway string is either nil or could not be parsed. Logging may help diagnose the issue. As we returning error, ipam will handle the error case and ignores the return values of pods and gatway ip

podIPNet, gatewayIP, err := ipconfig.ProcessIPConfigsResp(resp)
. Thanks for pointing out the missing test. I've added the appropriate test scenario to cover this case.

Comment on lines +154 to +165
infMac, err := net.ParseMAC(podIPInfo.MacAddress)
if err != nil {
p.logger.Error("Failed to parse interface MAC address", zap.Error(err), zap.String("macAddress", podIPInfo.MacAddress))
return cniTypes.NewError(cniTypes.ErrUnsupportedField, err.Error(), "failed to parse interface MAC address")
}

cniResult.Interfaces = append(cniResult.Interfaces, &types100.Interface{
Mac: infMac.String(),
})
seenInterfaces[podIPInfo.MacAddress] = true
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add UT for this case with dualstack ips?

Copy link
Author

@NihaNallappagari NihaNallappagari Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added UT for dual stack scenarios

Comment on lines 999 to 1027

// Map used to get the number of IPFamilies across all NCs
ncIPFamilies := map[cns.IPFamily]struct{}{}
// Gets the IPFamilies from all NCs and store them in a map. This will be used to determine the number of IPs to return
for ncID := range service.state.ContainerStatus {
if len(ncIPFamilies) == 2 {
break
}

for _, secIPConfig := range service.state.ContainerStatus[ncID].CreateNetworkContainerRequest.SecondaryIPConfigs {
if len(ncIPFamilies) == 2 {
break
}

addr, err := netip.ParseAddr(secIPConfig.IPAddress)
if err != nil {
continue
}

if addr.Is4() {
ncIPFamilies[cns.IPv4] = struct{}{}
} else if addr.Is6() {
ncIPFamilies[cns.IPv6] = struct{}{}
}
}
}
// Makes sure we have at least one IPFamily across all NCs
numOfIPFamilies := len(ncIPFamilies)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this to separate function say getNumberofIPFamily

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add UT for this too

Copy link
Author

@NihaNallappagari NihaNallappagari Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test and moved chunk of code to new function

small fixes

fix lint errors

lint fix

.

Delete examples/imds_nc_lookup.go

Signed-off-by: NihaNallappagari <69693983+NihaNallappagari@users.noreply.github.com>

fix lint

.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants