Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

entry point script migration to golang #1726

Closed
wants to merge 8 commits into from

Conversation

jayanthvn
Copy link
Contributor

@jayanthvn jayanthvn commented Nov 5, 2021

What type of PR is this?
Enhancement

Which issue does this PR fix:
N/A

What does this PR do / Why do we need it:
Rewrite the entry point shell scripts in golang

This is first part of the change to update to minimal distro image.

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:
N/A

Testing done on this change:

Yes

Test Cases

Startup Aws-node


dev-dsk-varavaj-2b-fdf1da64 % kubectl logs aws-node-55brn -n kube-system
time="2021-11-02T22:12:16Z" level=info msg="Install CNI binaries.."
time="2021-11-02T22:12:16Z" level=info msg="Starting IPAM daemon ... "
time="2021-11-02T22:12:16Z" level=info msg="Checking for IPAM connectivity ... "
time="2021-11-02T22:12:17Z" level=info msg="Copying config file ... "
time="2021-11-02T22:12:17Z" level=info msg="Successfully copied CNI plugin binary and config file."

Startup Init-container

dev-dsk-varavaj-2b-fdf1da64 % kubectl logs aws-node-8qk9p -n kube-system -c aws-vpc-cni-init
Installed /host/opt/cni/bin/loopback
time="2021-11-02T22:24:23Z" level=info msg="Copying CNI plugin binaries ..."
Installed /host/opt/cni/bin/portmap
Installed /host/opt/cni/bin/bandwidth
Installed /host/opt/cni/bin/aws-cni-support.sh
time="2021-11-02T22:24:23Z" level=info msg="Copied all CNI plugin binaries to /host/opt/cni/bin\n"
time="2021-11-02T22:24:23Z" level=info msg="Found hostIP 192.168.12.6 and primaryMAC 0a:5e:a9:0e:e5:f3"
time="2021-11-02T22:24:23Z" level=info msg="Found primaryIF eth0"
time="2021-11-02T22:24:23Z" level=info msg="Updated entry for 2"
time="2021-11-02T22:24:23Z" level=info msg="Updated entry for 1"
time="2021-11-02T22:24:23Z" level=info msg="CNI init container done"

PD is Enable but WARM targets = 0

dev-dsk-varavaj-2b-fdf1da64 % kubectl logs aws-node-8nx4j -n kube-system
time="2021-11-02T21:51:30Z" level=error msg="Setting WARM_PREFIX_TARGET = 0 is not supported while WARM_IP_TARGET/MINIMUM_IP_TARGET is not set. Please configure either one of the WARM_{PREFIX/IP}_TARGET or MINIMUM_IP_TARGET env variables"

PD is enabled but Warm target = -1

dev-dsk-varavaj-2b-fdf1da64 % kubectl logs aws-node-d8vv4 -n kube-system
time="2021-11-02T21:53:19Z" level=error msg="Setting WARM_PREFIX_TARGET = 0 is not supported while WARM_IP_TARGET/MINIMUM_IP_TARGET is not set. Please configure either one of the WARM_{PREFIX/IP}_TARGET or MINIMUM_IP_TARGET env variables"

Plugin log file = stdout

dev-dsk-varavaj-2b-fdf1da64 % kubectl logs aws-node-7xtgh -n kube-system
time="2021-11-02T21:55:29Z" level=error msg="AWS_VPC_K8S_PLUGIN_LOG_FILE cannot be set to stdout"

Veth prefix > 4

kubectl logs aws-node-vb8q5 -n kube-system
time="2021-11-02T21:57:29Z" level=error msg="AWS_VPC_K8S_CNI_VETHPREFIX cannot be longer than 4 characters"

Veth prefix = lo

kubectl logs aws-node-xwjvv -n kube-system
time="2021-11-02T21:58:28Z" level=error msg="AWS_VPC_K8S_CNI_VETHPREFIX cannot be set to reserved values eth or vlan or lo"

Created aws.conf file and restarted aws-node

[root@ip-192-168-12-6 net.d]# touch aws.conf
[root@ip-192-168-12-6 net.d]# pwd
/etc/cni/net.d
[root@ip-192-168-12-6 net.d]# ls
10-aws.conflist  aws.conf

File is delete -

[root@ip-192-168-12-6 net.d]# ls
10-aws.conflist

Enable bw plugin -

cat /etc/cni/net.d/10-aws.conflist
{
  "cniVersion": "0.3.1",
  "name": "aws-cni",
  "plugins": [
    {
      "name": "aws-cni",
      "type": "aws-cni",
      "vethPrefix": "eni",
      "mtu": "9001",
      "pluginLogFile": "/host/var/log/aws-routed-eni/plugin.log",
      "pluginLogLevel": "DEBUG"
    },
    {
      "name": "egress-v4-cni",
      "type": "egress-v4-cni",
      "ipam": {
        "type": "host-local",
        "routes": [
          {
            "dst": "0.0.0.0/0"
          }
        ],
        "dataDir": "/run/cni/v6pd/egress-v4-ipam",
        "ranges": [
          [
            {
              "subnet": "169.254.172.0/22"
            }
          ]
        ]
      },
      "enabled": "false",
      "nodeIP": "192.168.12.6",
      "mtu": "9001",
      "pluginLogFile": "/var/log/aws-routed-eni/egress-v4-plugin.log",
      "pluginLogLevel": "DEBUG"
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    },
    {
      "type": "bandwidth",
      "capabilities": {
        "bandwidth": true
      }
    }
  ]
}

Disable bw plugin -

[root@ip-192-168-12-6 ec2-user]# cat /etc/cni/net.d/10-aws.conflist
{
  "cniVersion": "0.3.1",
  "name": "aws-cni",
  "plugins": [
    {
      "name": "aws-cni",
      "type": "aws-cni",
      "vethPrefix": "eni",
      "mtu": "9001",
      "pluginLogFile": "/host/var/log/aws-routed-eni/plugin.log",
      "pluginLogLevel": "DEBUG"
    },
    {
      "name": "egress-v4-cni",
      "type": "egress-v4-cni",
      "mtu": "9001",
      "enabled": "false",
      "nodeIP": "192.168.12.6",
      "ipam": {
         "type": "host-local",
         "ranges": [[{"subnet": "169.254.172.0/22"}]],
         "routes": [{"dst": "0.0.0.0/0"}],
         "dataDir": "/run/cni/v6pd/egress-v4-ipam"
      },
      "pluginLogFile": "/var/log/aws-routed-eni/egress-v4-plugin.log",
      "pluginLogLevel": "DEBUG"
    },
    {
      "type": "portmap",
      "capabilities": {"portMappings": true},
      "snat": true
    }
  ]
}

Automation added to e2e:

Work in progress

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
No

Does this change require updates to the CNI daemonset config files to work?:

No

Does this PR introduce any user-facing change?:

No


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jayanthvn jayanthvn added this to the v1.10.1 milestone Nov 5, 2021
@srini-ram
Copy link
Contributor

Few high level comments before going into code details

  1. Can we build/test new distro based ARM aws-node binary on ARM instances.
  2. Can we move entry _point script changes to separate diff as this will be useful for AL2 and other images as well and not specific to minimal distro ?

@jayanthvn
Copy link
Contributor Author

Few high level comments before going into code details

  1. Can we build/test new distro based ARM aws-node binary on ARM instances.
  2. Can we move entry _point script changes to separate diff as this will be useful for AL2 and other images as well and not specific to minimal distro ?

Yes Srini will test on ARM nodes.

Reg keeping the changes separate is not possible since the minimal distro image doesn't have a shell to run the entry point script and also few packages like js which is needed for the script.

@srini-ram
Copy link
Contributor

srini-ram commented Nov 18, 2021

Few high level comments before going into code details

  1. Can we build/test new distro based ARM aws-node binary on ARM instances.
  2. Can we move entry _point script changes to separate diff as this will be useful for AL2 and other images as well and not specific to minimal distro ?

Yes Srini will test on ARM nodes.

Reg keeping the changes separate is not possible since the minimal distro image doesn't have a shell to run the entry point script and also few packages like js which is needed for the script.

Jayanth, I understand that script migration is a pre-req for minimal distro. But i was suggesting following order of changes

1/ Commit the script migration as a separate change so that AL2 can leverage this. [ This would require exhaustive testing and we can enhance our AL2 test suite as well] . This can potentially be bundled into CNI 10.1.2 release
2/ As a next step, we could bring in minimal distro changes and this can be bundled into CNI 10.1.3

This would help us isolate regression and unify API behavior across AL2/Minimal distro. As an example, AWS SDK attempts IMDSv2 and then falls back to IMDSv1. However, CNI 1.10.1 (AL2) entry point script doesn't do that.

@jayanthvn
Copy link
Contributor Author

Sure Srini I can separate it to 2 PRs.

@jayanthvn jayanthvn requested a review from a team as a code owner November 21, 2021 01:05
@jayanthvn jayanthvn changed the title Use miminal distro image entry point script migration to golang Nov 21, 2021
@jayanthvn
Copy link
Contributor Author

jayanthvn commented Nov 21, 2021

@srini-ram - Updated the PR to have the AL2 image. Will open a new PR for minimal distro once this is merged.

Automation will be added as a fast follow up. Will discuss about the test plan offline.

@jayanthvn jayanthvn modified the milestones: v1.11.0, v1.11.1 Mar 7, 2022
@github-actions
Copy link

github-actions bot commented May 7, 2022

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label May 7, 2022
@jayanthvn
Copy link
Contributor Author

/not stale

@github-actions github-actions bot removed the stale Issue or PR is stale label May 10, 2022
@github-actions
Copy link

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added stale Issue or PR is stale and removed stale Issue or PR is stale labels Jul 10, 2022
@vikasmb vikasmb removed this from the v1.11.3 milestone Aug 4, 2022
@github-actions
Copy link

github-actions bot commented Oct 4, 2022

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Oct 4, 2022
@jayanthvn
Copy link
Contributor Author

/not stale

@jayanthvn jayanthvn removed the request for review from srini-ram October 12, 2022 15:03
@jdn5126 jdn5126 self-assigned this Oct 12, 2022
@github-actions github-actions bot removed the stale Issue or PR is stale label Oct 13, 2022
Copy link
Contributor

@jdn5126 jdn5126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall content looks good to me, just a few nits. I can review further after the conflicts are resolved and jobs pass.

Quick question, any interest in pulling in go modules for cp and sysctl instead of implementing our own? I can see either side, I guess it depends on how specific this use is.

val, _ = sys.GetSysctl(entry)
log.Infof("Updated entry for %d", val)

enableIPv6 := getEnv(envEnableIPv6, "false")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the environment variable is not present, then we would be leaving default IPv6 settings. Should this explicitly disable IPv6 when env var is not present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes false is the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that default could change on AL2. I think we should call these sysctls either way to be safe

cmd/aws-vpc-cni-init/main.go Show resolved Hide resolved
entry = "net/ipv6/conf/all/disable_ipv6"
err = sys.SetSysctl(entry, 0)
if err != nil {
log.WithError(err).Fatalf("Failed to enable disable_ipv6\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe "Failed to set disable_ipv6 to 0"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can add it.

if outb.String() == "" {
return true
}
time.Sleep(1 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need a sleep here? Won't cmd already have a built-in grpc timeout that should be sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can remove it, this was based of the old code and it was removed recently.

https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.10.3/scripts/entrypoint.sh#L120

cmd/aws-vpc-cni/main.go Show resolved Hide resolved
misc/10-aws.conflist Show resolved Hide resolved
return hostIP, nil
}

time.Sleep(1 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems like GetMetaData already has a timeout built in, so do we need this sleep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are 10 retries so after that I added this sleep for the next retry. But let me check if we can remove it, we can save a second.

@jayanthvn
Copy link
Contributor Author

Overall content looks good to me, just a few nits. I can review further after the conflicts are resolved and jobs pass.

Quick question, any interest in pulling in go modules for cp and sysctl instead of implementing our own? I can see either side, I guess it depends on how specific this use is.

There were some 3P implementation for cp and sysctl, hence I added utils similar to upstream K8s.

@jayanthvn
Copy link
Contributor Author

Closing this PR since including the changes in #2146

@jayanthvn jayanthvn closed this Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants