Skip to content

Commit

Permalink
Prefix Delegation: Merge from prefix delegation preview branch to mas…
Browse files Browse the repository at this point in the history
…ter (#1516)

* [Preview] Prefix delegation feature development (#1434)

* ENABLE_PREFIX_DELEGATION knob

WARM_PREFIX_TARGET knob

cr https://code.amazon.com/reviews/CR-40610031

* PD changes - dev only

* Cooldown prefix IP

* minor fixes to support prefix count

* Code cleanup

* Handle few corner cases

* Nitro based check

* With custom networking, do not get prefix for primary ENI

* Code refactor

* Handle graceful upgrade/enable PD from disable PD

* code refactoring

* Code refactoring

* fix computing too low IPs

* UT for prefix store

* Fix UTs and handle CR comments

* Clean up SDK code and fix model code generation

* fix format and merge induced error

* Merge broke the code

* Fix Dockerfile.test

* Added IPAMD UTs and fixed removeENI total count

* Couple more IPAMD UTs for PD

* UTs for awsutils/imds

* Handle graceful PD enable to disable knob

* get prefix list for non-pd case

* Prevent reconcile of prefix IPs in IP datastore

* Handle disable scenario

* fix formatting

* clean up comment

* Remove unnecessary debugs

* Handle PR comments

* formatting fix

* Remodelled PD datastore

* Fix up UTs and fix Prefix nil

* formatting

* PR comments - minor cosmetic changes

* removed the sdk override from makefile

* Internal repo merge added these lines

* Update config file

* Handle wrapper of DescribeNetworkInterfacesWithContext to take one eni

* RemoveUnusedENIFromStore was not accounting for prefixes deleted

* Removed hardcoding of 16

* Code refactor - merge ENI's secondary and prefix store into single store of CIDRs  (#1471)

* Code refactor - merge to single DB

* remove few debugs

* remove prefix store files

* PR comments

* Fix up CR comments

* formatting

* Updated UT cases

* UT and formatting

* Minor fixes

* Minor comments

* Updated /32 store term

* remove unused code

* Multi-prefix and WARM/MIN IP targets support with PD (#1477)

* Multi-pd and WARM targets support

* cleanup

* Updated variable names

* Default prefix count to -1

* Get stats should be computed on the fly since CIDR pool can have /32 or /28

* Support for warm prefix 0

* code review comments

* PD test cases and readme update (#1478)

* Traffic test case and readme update

* Added testcases for warm ip/min ip with PD

* Testcases for prefix count

* Testcase for warm prefix along with warm ip/min ip

* Updated traffic test case while PD mode is flipped

* Fix minor comments

* pr comments

* added pods per eni

* fix up count

* Support mixed instances with PD (#1483)

* Support mixed instances with PD

* fix up the log

* Optimization for prefixes allocation (#1500)

* optimization for prefixes

Prefix store optimization

* pr comment

* Fixup eni allocation with warm targets (#1512)

* Fixup eni allocation with warm targets

* fixup cidr count

* code comments and warm prefix 0

* Default WARM_PREFIX_TARGET to 1 (#1515)

* Handle prefix target 0

* pr commets

* Fix up UTs, was failing because of vendor

* No need to commit this

* make format

* needed for UT workflow

* IMDS code refactor

* PR comments - v1

Error with merge

* PR comments v2

* PR comments - v3

* Update logs

* PR comments - v4
  • Loading branch information
jayanthvn authored Jun 24, 2021
1 parent 93edc95 commit 968ae01
Show file tree
Hide file tree
Showing 26 changed files with 3,797 additions and 966 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ grpc-health-probe
cni-metrics-helper
coverage.txt
build/
vendor
vendor
10 changes: 5 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ endif
# LDFLAGS is the set of flags used when building golang executables.
LDFLAGS = -X main.version=$(VERSION) -X pkg/awsutils/awssession.version=$(VERSION)
# ALLPKGS is the set of packages provided in source.
ALLPKGS = $(shell go list ./... | grep -v cmd/packet-verifier)
ALLPKGS = $(shell go list $(VENDOR_OVERRIDE_FLAG) ./... | grep -v cmd/packet-verifier)
# BINS is the set of built command executables.
BINS = aws-k8s-agent aws-cni grpc-health-probe cni-metrics-helper
# Plugin binaries
Expand Down Expand Up @@ -144,7 +144,7 @@ docker-func-test: docker ## Run the built CNI container image to use in func
# Run unit tests
unit-test: export AWS_VPC_K8S_CNI_LOG_FILE=stdout
unit-test: ## Run unit tests
go test -v -coverprofile=coverage.txt -covermode=atomic $(ALLPKGS)
go test -v $(VENDOR_OVERRIDE_FLAG) -coverprofile=coverage.txt -covermode=atomic ./pkg/...

# Run unit tests with race detection (can only be run natively)
unit-test-race: export AWS_VPC_K8S_CNI_LOG_FILE=stdout
Expand Down Expand Up @@ -207,7 +207,7 @@ generate:
# Generate eni-max-pods.txt file for EKS AMI
generate-limits: GOOS=
generate-limits: ## Generate limit file go code
go run scripts/gen_vpc_ip_limits.go
go run $(VENDOR_OVERRIDE_FLAG) scripts/gen_vpc_ip_limits.go

# Fetch the CNI plugins
plugins: FETCH_VERSION=0.9.0
Expand Down Expand Up @@ -253,8 +253,8 @@ helm-lint:
@${MAKEFILE_PATH}test/helm/helm-lint.sh

# Run go vet on source code.
vet: ## Run go vet on source code.
go vet $(ALLPKGS)
vet: setup-ec2-sdk-override ## Run go vet on source code.
go vet $(VENDOR_OVERRIDE_FLAG) $(ALLPKGS)


docker-vet: build-docker-test ## Run go vet inside of a container.
Expand Down
38 changes: 34 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,10 +236,17 @@ Type: Integer

Default: None

Specifies the number of free IP addresses that the `ipamd` daemon should attempt to keep available for pod assignment on the node.
For example, if `WARM_IP_TARGET` is set to 5, then `ipamd` attempts to keep 5 free IP addresses available at all times. If the
Specifies the number of free IP addresses that the `ipamd` daemon should attempt to keep available for pod assignment on the node.
With `ENABLE_PREFIX_DELEGATION` set to `true` then `ipamd` daemon will check if the existing (/28) prefixes are enough to maintain the
`WARM_IP_TARGET` if it is not sufficent then more prefixes will be attached.

For example,

1. if `WARM_IP_TARGET` is set to 5, then `ipamd` attempts to keep 5 free IP addresses available at all times. If the
elastic network interfaces on the node are unable to provide these free addresses, `ipamd` attempts to allocate more interfaces
until `WARM_IP_TARGET` free IP addresses are available.
until `WARM_IP_TARGET` free IP addresses are available.
2. `ENABLE_PREFIX_DELEGATION` set to `true` and `WARM_IP_TARGET` is 16. Initially 1 (/28) prefix is sufficient but once a single pod is assigned IP then
remaining free IPs are 15 hence IPAMD will allocate 1 more prefix to achieve 16 `WARM_IP_TARGET`

**NOTE!** Avoid this setting for large clusters, or if the cluster has high pod churn. Setting it will cause additional calls to the
EC2 API and that might cause throttling of the requests. It is strongly suggested to set `MINIMUM_IP_TARGET` when using `WARM_IP_TARGET`.
Expand All @@ -248,7 +255,8 @@ If both `WARM_IP_TARGET` and `MINIMUM_IP_TARGET` are set, `ipamd` will attempt t
This environment variable overrides `WARM_ENI_TARGET` behavior. For a detailed explanation, see
[`WARM_ENI_TARGET`, `WARM_IP_TARGET` and `MINIMUM_IP_TARGET`](https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/eni-and-ip-target.md).


`ENABLE_PREFIX_DELEGATION` set to `true` and this environment variable overrides `WARM_PREFIX_TARGET` behavior. For a detailed explanation, see
[`WARM_PREFIX_TARGET`, `WARM_IP_TARGET` and `MINIMUM_IP_TARGET`](https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/prefix-and-ip-target.md).
---

`MINIMUM_IP_TARGET` (Since v1.6.0)
Expand Down Expand Up @@ -450,6 +458,28 @@ You can use the below command to enable `DISABLE_TCP_EARLY_DEMUX` to `true` -
```
kubectl patch daemonset aws-node -n kube-system -p '{"spec": {"template": {"spec": {"initContainers": [{"env":[{"name":"DISABLE_TCP_EARLY_DEMUX","value":"true"}],"name":"aws-vpc-cni-init"}]}}}}'
```
---

`ENABLE_PREFIX_DELEGATION` (Since v1.9)

Type: Boolean as a String

Default: `false`

To enable IPv4 prefix delegation on nitro instances. Setting `ENABLE_PREFIX_DELEGATION` to `true` will start allocating a /28 prefix
instead of a secondary IP in the ENIs subnet. The total number of prefixes and private IP addresses will be less than the
limit on private IPs allowed by your instance. Setting or resetting of `ENABLE_PREFIX_DELEGATION` while pods are running or if ENIs are attached is supported and the new pods allocated will get IPs based on the mode of IPAMD but the max pods of kubelet should be updated which would need either kubelet restart or node recycle.

---

`WARM_PREFIX_TARGET`

Type: Integer

Default: None

Specifies the number of free IPv4(/28) prefixes that the `ipamd` daemon should attempt to keep available for pod assignment on the node.
This environment variable works when `ENABLE_PREFIX_DELEGATION` is set to `true` and is overriden when `WARM_IP_TARGET` and `MINIMUM_IP_TARGET` are configured.

### ENI tags related to Allocation

Expand Down
2 changes: 2 additions & 0 deletions config/master/aws-k8s-cni.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,8 @@
"value": "false"
- "name": "ENABLE_POD_ENI"
"value": "false"
- "name": "ENABLE_PREFIX_DELEGATION"
"value": "false"
- "name": "MY_NODE_NAME"
"valueFrom":
"fieldRef":
Expand Down
1 change: 1 addition & 0 deletions config/master/manifests.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ local awsnode = {
DISABLE_INTROSPECTION: "false",
DISABLE_METRICS: "false",
ENABLE_POD_ENI: "false",
ENABLE_PREFIX_DELEGATION: "false",
MY_NODE_NAME: {
valueFrom: {
fieldRef: {fieldPath: "spec.nodeName"},
Expand Down
34 changes: 34 additions & 0 deletions docs/prefix-and-ip-target.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## `WARM_PREFIX_TARGET`, `WARM_IP_TARGET` and `MINIMUM_IP_TARGET`

IPAMD will start allocating (/28) prefixes to the ENIs with `ENABLE_PREFIX_DELEGATION` set to `true`. By default IPAMD will allocate 1 prefix for the allocated ENI but based on the need the number of prefixes to be held in warm pool can be controlled by setting `WARM_PREFIX_TARGET`, `WARM_IP_TARGET` and `MINIMUM_IP_TARGET` environment variables.

`WARM_IP_TARGET` and `MINIMUM_IP_TARGET` if set will override `WARM_PREFIX_TARGET`. `WARM_PREFIX_TARGET` will allocate one full (/28) prefix even if a single IP is consumed with the existing prefix. If the ENI has no space to allocate a prefix then a new ENI will be created. So make sure to use this on need basis i.e, if pod density is high since this will be carved out of the ENIs subnet. `WARM_IP_TARGET` and `MINIUM_IP_TARGET` give more fine grained control on the number of IPs but if existing prefixes are not sufficient to maintain the warm pool then IPAMD will allocate more prefixes to the existing ENI or create a new ENI if the existing ENIs are running out of prefixes.

When a new ENI is allocated, IPAMD will allocate either 1 prefix or number of prefixes needed to maintain the `WARM_PREFIX_TARGET`, `WARM_IP_TARGET` and `MINIMUM_IP_TARGET` setting. This is done to avoid extra EC2 calls to either allocate more prefixes or free extra prefixes on ENI bring up.


Some example cases:

| Instance type | `WARM_PREFIX_TARGET`| `WARM_IP_TARGET`| `MINIMUM_IP_TARGET` | Pods | ENIs | Pod per ENIs | Attached Prefixes | Unused Prefixes | Prefixes per ENI | Unused IPs|
|---------------|:-------------------:|:---------------:|:-------------------:|:----:|:----:|:------------:|:-----------------:|:---------------:|:----------------:|:---------:|
| t3.small | 1 | - | - | 0 | 1 | 0 | 1 | 1 | 1 | 16 |
| t3.small | 1 | - | - | 5 | 3 | 1,2,2 | 4 | 1 | 2,1,1 | 59 |
| t3.small | 1 | - | - | 17 | 1 | 17 | 3 | 1 | 3 | 31 |
| | | | | | | | | | | |
| t3.small | - | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 16 |
| t3.small | - | 1 | 1 | 5 | 3 | 1,2,2 | 3 | 0 | 1,1,1 | 43 |
| t3.small | - | 1 | 1 | 17 | 1 | 17 | 2 | 0 | 2 | 15 |
| | | | | | | | | | | |
| t3.small | - | 2 | 10 | 0 | 1 | 0 | 1 | 1 | 1 | 16 |
| t3.small | - | 2 | 10 | 5 | 3 | 1,2,2 | 3 | 0 | 1,1,1 | 43 |
| t3.small | - | 2 | 10 | 17 | 1 | 17 | 2 | 0 | 2 | 15 |
| | | | | | | | | | | |
| p3dn.24xlarge | 1 | - | - | 0 | 1 | 0 | 1 | 1 | 1 | 16 |
| p3dn.24xlarge | 1 | - | - | 3 | 2 | 3,0 | 2 | 1 | 2,0 | 29 |
| p3dn.24xlarge | 1 | - | - | 95 | 3 | 95,0,0 | 7 | 1 | 7,0,0 | 17 |
| | | | | | | | | | | |
| p3dn.24xlarge | - | 5 | 10 | 0 | 1 | 0 | 1 | 1 | 1 | 16 |
| p3dn.24xlarge | - | 5 | 10 | 7 | 1 | 7 | 1 | 0 | 1 | 9 |
| p3dn.24xlarge | - | 5 | 10 | 15 | 1 | 15 | 2 | 1 | 2 | 17 |
| p3dn.24xlarge | - | 5 | 10 | 45 | 2 | 45,0 | 4 | 1 | 4,0 | 19 |
| | | | | | | | | | | |
11 changes: 4 additions & 7 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@ require (
github.com/golang/mock v1.4.1
github.com/golang/protobuf v1.4.2
github.com/google/go-jsonnet v0.16.0
github.com/google/gopacket v1.1.18
github.com/gregjones/httpcache v0.0.0-20190212212710-3befbb6ad0cc // indirect
github.com/pkg/errors v0.9.1
github.com/prometheus/client_golang v1.0.0
github.com/prometheus/client_model v0.2.0
Expand All @@ -20,11 +18,10 @@ require (
github.com/stretchr/testify v1.5.1
github.com/vishvananda/netlink v1.1.1-0.20201029203352-d40f9887b852
go.uber.org/zap v1.15.0
golang.org/x/lint v0.0.0-20201208152925-83fdc39ff7b5 // indirect
golang.org/x/mod v0.4.0 // indirect
golang.org/x/net v0.0.0-20201110031124-69a78807bb2b
golang.org/x/sys v0.0.0-20201117170446-d9b008d0a637
golang.org/x/tools v0.0.0-20210113180300-f96436850f18 // indirect
golang.org/x/lint v0.0.0-20210508222113-6edffad5e616 // indirect
golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4
golang.org/x/sys v0.0.0-20210616094352-59db8d763f22
golang.org/x/tools v0.1.3 // indirect
google.golang.org/grpc v1.29.0
gopkg.in/natefinch/lumberjack.v2 v2.0.0
k8s.io/api v0.18.6
Expand Down
Loading

0 comments on commit 968ae01

Please sign in to comment.