Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for setting init container options and per-pod security groups #617

Merged
merged 3 commits into from
Nov 9, 2021
Merged

Support for setting init container options and per-pod security groups #617

merged 3 commits into from
Nov 9, 2021

Conversation

alex-hunt-materialize
Copy link
Contributor

@alex-hunt-materialize alex-hunt-materialize commented Sep 2, 2021

Proposed changes

Supports setting the init container image and DISABLE_TCP_EARLY_DEMUX
env var.

This allows users to upgrade their version of the CNI, and to support
per-pod security groups.

Related issues

Fixes #619
Fixes #620

Relates to, but doesn't solve: #622

Supports setting the init container image and DISABLE_TCP_EARLY_DEMUX
env var.

This allows users to upgrade their version of the CNI, and to support
per-pod security groups.
@alex-hunt-materialize
Copy link
Contributor Author

alex-hunt-materialize commented Sep 2, 2021

I unfortunately am unable to test this change, since I can't figure out how to install it and use it with a python virtualenv (all our pulumi uses python). Any advice on how to do that would be greatly appreciated.

EDIT: I got this working with the python module by a hacky/half-implementation of #595 and copying the binary and some remaining files into my PATH. This PR works in our environment.

The tests seem to run locally, but all get marked as failed due to being unable to write the results to S3. EDIT: that's a red herring, it's actually because I hit AWS limits for VPCs and EIPs. Running without parallelism seems to fix all failures except TestAccAwsProfile which requires a second AWS profile that I haven't configured.

I have no idea why local tests should write their output to S3. That seems like an unrelated bug, which I've filed at #621

Copy link
Contributor

@lukehoban lukehoban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - thanks!

On test coverage - it would be really nice to add a simple test that uses a newer CNI than the default - just to lock in that that works. Perhaps alongside examples/cluster but configured with the new input parameter supported here?

},
"disableTcpEarlyDemux": {
TypeSpec: schema.TypeSpec{Type: "boolean"},
Description: "Allows the kubelet's liveness and readiness probes to connect via TCP when pod ENI is enabled. This will slightly increase local TCP connection latency.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little unfortunate, but for consistency, lets line wrap this similar to other descriptions in this file. (A little surprised we aren't getting a int failure on this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"of secondary ENI IP addresses. If set to true, the SNAT iptables rule and off-VPC " +
"IP rule are not applied, and these rules are removed if they have already been " +
"applied.\n\nDefaults to false.",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we deprecate this instead of removing it in case users are depending on it?

Probably not related to this PR - but the inconsistency that some of these are cni... and others aren't is a little odd. It actually feels like we most likely want to standardize on not including the cni prefix (since these are all properties of the VpcCni anyway. In which case, it might make sense to deprecate the other one instead? (And file an issue to track aligning the remaining too cni... property names with the naming scheme?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on deprecating instead. If you want to back this change out and open an issue about the duplication instead, we can address that separately and in the process standardize the properties as Luke suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've backed it out, and am filing a new issue for this.

@viveklak viveklak self-requested a review September 7, 2021 23:30
Copy link
Contributor

@viveklak viveklak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! Looks great. Could you take a look at a couple of the suggestions add a test as Luke mentioned?

"of secondary ENI IP addresses. If set to true, the SNAT iptables rule and off-VPC " +
"IP rule are not applied, and these rules are removed if they have already been " +
"applied.\n\nDefaults to false.",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on deprecating instead. If you want to back this change out and open an issue about the duplication instead, we can address that separately and in the process standardize the properties as Luke suggested.

add custom images to cluster example, and reduce line length in
disableTcpEarlyDemux description.
@alex-hunt-materialize
Copy link
Contributor Author

I've backed out the externalSnat change, added image and initImage VpcCniOptions to examples/cluster, and filed #624 to later fix the externalSnat issue.

vpcCniOptions: {
image: "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.9.0",
initImage: "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni-init:v1.9.0",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to ensure the VPC CNI options were actually honored as well. The validation could be added here.

Copy link
Contributor

@viveklak viveklak Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to ensure the VPC CNI options were actually honored as well. The validation could be added here.

You might have to export some additional information, e.g. the spec from index.ts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My typescript/javascript skills are pretty much non-existent. I'm unsure how to export these.
I've tried:

export const image2 = cluster2.core.vpcCni.image;
export const initImage2 = cluster2.core.vpcCni.initImage;

but I get:

    index.ts(36,45): error TS2339: Property 'image' does not exist on type 'VpcCni'.
    index.ts(37,49): error TS2339: Property 'initImage' does not exist on type 'VpcCni'.

@viveklak
Copy link
Contributor

viveklak commented Sep 8, 2021

This also fixes a duplicate argument, where externalSnat and cniExternalSnat would potentially fight over setting the AWS_VPC_K8S_CNI_EXTERNALSNAT environment variable. I can find no reason for both to exist, and cniExternalSnat seemed more complete, so I kept that one.

Could you remove this from the PR body since it was backed out?

@alex-hunt-materialize
Copy link
Contributor Author

This also fixes a duplicate argument, where externalSnat and cniExternalSnat would potentially fight over setting the AWS_VPC_K8S_CNI_EXTERNALSNAT environment variable. I can find no reason for both to exist, and cniExternalSnat seemed more complete, so I kept that one.

Could you remove this from the PR body since it was backed out?

Done

@benesch
Copy link

benesch commented Sep 10, 2021

So, @viveklak, I'm afraid we don't have the expertise in house at @MaterializeInc to get the test suite properly amended for this change. This package in particular is really tricky to develop on locally!

The good news is that I managed to ship a forked of pulumi-eks: https://github.com/MaterializeInc/pulumi-eks/commit/d4b05f9370462c1a64a0e582e36ef2c1c0a04e77#diff-5d43b9a26b3f3c15075a30aed416693a0ccd1d793cce198952bc4cf0693ca63c. That relieves the time pressure on our end. We'd still love to see this merged, because we don't want to be maintaining a fork long term, but I think we've probably pushed this PR as far as we can.

@viveklak
Copy link
Contributor

So, @viveklak, I'm afraid we don't have the expertise in house at @MaterializeInc to get the test suite properly amended for this change. This package in particular is really tricky to develop on locally!

The good news is that I managed to ship a forked of pulumi-eks: MaterializeInc@d4b05f9#diff-5d43b9a26b3f3c15075a30aed416693a0ccd1d793cce198952bc4cf0693ca63c. That relieves the time pressure on our end. We'd still love to see this merged, because we don't want to be maintaining a fork long term, but I think we've probably pushed this PR as far as we can.

No worries and glad this worked for you. I will verify this on my end tomorrow and merge, then follow up with any test related updates. Thanks for taking the time to make the change!

@TonyRippy
Copy link

Hi all, looks like there are some merge conflicts that need to be resolved before this can be merged. Does anyone with write access have some time to resolve the conflict?

@github-actions
Copy link

github-actions bot commented Oct 8, 2021

PR is now waiting for a maintainer to run the acceptance tests. This PR will only perform build and linting.
Note for the maintainer: To run the acceptance tests, please comment /run-acceptance-tests on the PR

@alex-hunt-materialize
Copy link
Contributor Author

alex-hunt-materialize commented Nov 1, 2021

@viveklak Any update on this? It's been working great for us on our fork for close to two months now. We'd love to point back at the upstream soon, but we can't until this is merged.

@viveklak
Copy link
Contributor

viveklak commented Nov 9, 2021

/run-acceptance-tests

@github-actions
Copy link

github-actions bot commented Nov 9, 2021

@viveklak
Copy link
Contributor

viveklak commented Nov 9, 2021

I have confirmed the changes work. However, the command dispatch workflow used to validate external PRs seems to be currently broken due to schema change detection not being able to get access to the originating workflow. We are looking to fix this but I would like to get this change in, so I opened #631 which pulls in these changes and adds a test to validate. I'd like to close this in favor of #631. Huge thanks for the contribution @alex-hunt-materialize and team!

@viveklak viveklak merged commit bd2bd47 into pulumi:master Nov 9, 2021
@lukehoban
Copy link
Contributor

Thanks @alex-hunt-materialize! 🚀

@alex-hunt-materialize
Copy link
Contributor Author

alex-hunt-materialize commented Nov 9, 2021

Thank you all for merging it :)

@viveklak
Copy link
Contributor

viveklak commented Nov 9, 2021

🙌🏾 - I will cut a release shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants