Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encrypt gRPC traffic between Contour and Envoy #862

Closed
scottslowe opened this issue Jan 25, 2019 · 14 comments · Fixed by #1203
Closed

Encrypt gRPC traffic between Contour and Envoy #862

scottslowe opened this issue Jan 25, 2019 · 14 comments · Fixed by #1203
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@scottslowe
Copy link

Feature we'd like to see added:
When using TLS termination for Ingress/IngressRoute objects with Contour, certificates have to be passed to Envoy over the gRPC connection. Given the sensitive nature of TLS certificates, some security-conscious customers want the gRPC connection between the Contour pods and the Envoy pods to be encrypted (cc @stevesloka)

Environment:
This would apply to potentially any environment running Contour.

  • Contour version: N/A
  • Kubernetes version: (use kubectl version): Any supported
  • Kubernetes installer & version: Any supported
  • Cloud provider or hardware configuration: Any supported
  • OS (e.g. from /etc/os-release): Any supported
@davecheney
Copy link
Contributor

davecheney commented Jan 25, 2019 via email

@davecheney davecheney added kind/feature Categorizes issue or PR as related to a new feature. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. blocked/needs-design Categorizes the issue or PR as blocked because it needs a design document. labels Feb 5, 2019
@davecheney davecheney added this to the 0.11.0 milestone Feb 11, 2019
@vaamarnath
Copy link
Contributor

I am willing to pick this issue. Is this work something that a newbie can pick to work on?

From my understanding of contour architecture and context provided here, the work involved is to add mTLS (maybe just TLS?) for communication between contour and envoy. I can put in a design document if my understanding of the enhancement request is correct.

@davecheney
Copy link
Contributor

davecheney commented Feb 18, 2019 via email

@vaamarnath
Copy link
Contributor

Thanks for your reply @davecheney. Before I start putting down my approach into a design document, I would like to run the approach with you at a high level. Here is what I am thinking now:

  1. Use mTLS between envoy and contour to have authorisation.
  2. TLS certs can be user provided (is this interruptive to deployment experience for an user?). Another option is to have these certificates generated by an init container as part of the bootstrapping process. The certificates can be signed by an intermediary CA which itself is signed by Kubernetes CA.
  3. To rotate or invalidate these certificates, the idea is to make use of Kubernetes Jobs if the TLS certs are generated by the bootstrapping process. If the certificates are user provided, the user is responsible for rotating and invalidating the certificates.
  4. To secure these TLS certificates I am thinking to make use of Kubernetes secrets.

Does this approach sound sensible to you?

@davecheney
Copy link
Contributor

  1. Use mTLS between envoy and contour to have authorisation.

Sounds good. What are the configuration changes we need to make in Envoy's bootstrap config (contour bootstrap /path) and Contour's various tables? Do we need to implement the SDS services in Contour? If so, is there an interim method to use static credentials then upgrade to SDS?

  1. TLS certs can be user provided (is this interruptive to deployment experience for an user?).

This isn't preferable, ideally I'd like what we have now -- the ease of use of envoy and contour containers colocated in the same pod to continue to be as easy to deploy -- having to get the user to create a CA and child certs before they can deploy contour would have a negative effect on our new user story.

Another option is to have these certificates generated by an init container as part of the bootstrapping process. The certificates can be signed by an intermediary CA which itself is signed by Kubernetes CA.

That would be better. Contour should be able to generate all this configuration on bootstrap.

  1. To rotate or invalidate these certificates, the idea is to make use of Kubernetes Jobs if the TLS certs are generated by the bootstrapping process. If the certificates are user provided, the user is responsible for rotating and invalidating the certificates.

I think for an initial cut, if we go with static credentials that you need to refresh via a redeploy, that is acceptable for a first cut.

Would you please open an associated ticket to investigate implementing SDS, https://www.envoyproxy.io/docs/envoy/latest/configuration/secret#config-secret-discovery-service. I think we're going to need this to

  1. To secure these TLS certificates I am thinking to make use of Kubernetes secrets.

For static credentials, a secret mounted as a filesystem for both the contour and envoy containers sounds like a good first step. If they are in the same namespace then we can rely on RBAC to secure that key material.

See my comments above about SDS, I think we'll probably need to bring that into the design, especially as SDS is expected to accessible over a unix domain socket or loopback interface.

One thing that I should probably make clear is this is a large feature, and I don't expect it to be delivered in one PR, or even one release cycle. I much prefer to see small, self contained, PRs that generally move the code forward, cleaning up and refactoring as you go.

@vaamarnath
Copy link
Contributor

  1. Use mTLS between envoy and contour to have authorisation.

Sounds good. What are the configuration changes we need to make in Envoy's bootstrap config (contour bootstrap /path) and Contour's various tables? Do we need to implement the SDS services in Contour? If so, is there an interim method to use static credentials then upgrade to SDS?

I will spend some time to figure this out. I need to understand about the bootstrap of Contour. I didn't know about SDS and an initial reading about it gives me an impression that it would be necessary to have it implemented in the long term.

  1. To rotate or invalidate these certificates, the idea is to make use of Kubernetes Jobs if the TLS certs are generated by the bootstrapping process. If the certificates are user provided, the user is responsible for rotating and invalidating the certificates.

I think for an initial cut, if we go with static credentials that you need to refresh via a redeploy, that is acceptable for a first cut.

Okay, it makes sense to have static credentials for the first cut and then move towards having SDS.

Would you please open an associated ticket to investigate implementing SDS, https://www.envoyproxy.io/docs/envoy/latest/configuration/secret#config-secret-discovery-service. I think we're going to need this to

I will open an issue for this.

  1. To secure these TLS certificates I am thinking to make use of Kubernetes secrets.

For static credentials, a secret mounted as a filesystem for both the contour and envoy containers sounds like a good first step. If they are in the same namespace then we can rely on RBAC to secure that key material.

Agreed, as a starting point, we can rely on static credentials mounted for both the Contour and Envoy containers. When #881 is being implemented, we may need to move towards using Kubernetes secrets with RBAC as mentioned above.

One thing that I should probably make clear is this is a large feature, and I don't expect it to be delivered in one PR, or even one release cycle. I much prefer to see small, self contained, PRs that generally move the code forward, cleaning up and refactoring as you go.

Yes, I understand this is a large feature. I will ensure to follow these guidelines.

@vaamarnath
Copy link
Contributor

@davecheney I have put in the first draft of the design for review. I have an open question that I need to address in the design. This is about whether there is a need to update any Contour's table as commented by you earlier here. I couldn't figure that out, could you please provide me more context about it?

@davecheney davecheney modified the milestones: 0.11.0, 0.12.0 Apr 8, 2019
@davecheney davecheney removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Apr 9, 2019
@timh timh removed this from the 0.12.0 milestone Apr 10, 2019
@davecheney
Copy link
Contributor

davecheney commented Apr 10, 2019

Hello,

Currently traffic between Contour and Envoy is not encrypted. This is generally ok because in the stock deployment Contour and Envoy are co-located in the same pod and the control plane traffic travels over the loopback. However this happy state of affairs is undone by two competing desires.

  1. We want to separate the Contour control plane and the Envoy data plane as they have different lifecycles. See Split Contour and Envoy containers into their own pods #881
  2. For some deployments, notably heptio/gimbal we already do this.

The problem becomes that Contour control plane traffic must be served over the pod network, and anyone that can connect to Contour can ask for the LDS tables which contain TLS secrets. This is bad.

The approach so far (see #901) focuses on encrypting the traffic between Envoy and Contour. This is difficult because Envoy wants to use TLS client certs (I guess this is what people call mTLS, but I'm not sure) for both encryption and authentication. So to implement this we're faced with the problem of distributing a client cert and CA to Envoy's so they can connect back to Contour. There are lots of moving parts and while we can probably get it working I wonder if we're actually solving the problem at hand.

To define the problem, Contour serves LDS data including certificate data to anyone that asks. The solution we have pursued up to this point is to add encryption and authentication so that only authorised people can make LDS queries and others cannot eavesdrop on the conversation. My alternative to this is we will stop delivering certificate data via LDS. Putting aside how this is going to work for a moment, this alternative solution reduces the need to encrypt Envoy to Contour communication because the remainder of the information in the RDS/EDS/LDS/CDS tables is not private -- it is derived from information already published in the API server. This is not entirely true, although someone with access to the API server may not be permitted to read IngressRoute, Service, Ingress, or Endpoint objects from foreign namespaces they would be able to ask Contour for its versions of those. It is not clear to me the attack surface this presents, but it feels like the acute problem of creating a gRPC service that sends you any certificate private key if you know its IP address has been mitigated.

I propose that we move forward with the following in stages, with a strong preference for doing point 1 for 0.12, and addressing others in future releases.

  1. Implement an SDS sidecar process, probably a new contour subcommand (like serve) which presents an SDS endpoint over the loopback interface (or unix domain socket, details not important). That process will be run in the same pod as Envoy in the same way as we deploy Envoy and Contour currently. Envoy will be configured to obtain secrets via SDS, thus via this local sidecar, the rest of the xDS api's will be presented from the Contour control plane (contour serve) as before. When the control plane references a secret it will do so via name and Envoy will resolve that secret against its local SDS sidecar. This may mean that for 0.12 and Envoy pod includes 3 containers, contour serve, contour sds, and envoy, or we may add a new combined mode for serving xDS and SDS from a single contour process. This is all for 0.12.

  2. Do Split Contour and Envoy containers into their own pods #881, this will take a release or two to roll out.

  3. Look at encrypting traffic between Contour and Envoy. This may be by Contour generating a server cert, or we may hairpin via the API server as the Jetstack folks are doing. I'm tempted to do this instead as it removes the need for Contour to manage its own TLS certs.

@davecheney davecheney added this to the 0.12.0 milestone Apr 10, 2019
@davecheney davecheney removed the blocked/needs-design Categorizes the issue or PR as blocked because it needs a design document. label Apr 10, 2019
@jbeda
Copy link
Contributor

jbeda commented Apr 10, 2019

One of the goals (IMO) for splitting out Envoy and Contour into separate pods is that we can then run Envoy with no service account and no line back to the API server. If that pod has an agent reading secrets and reserving them to envoy then we don’t get that benefit.

FWIW — this whole thing is why I created SPIFFE ;)

I think it is worth looking at the state of the art wrt webhooks from the API server. This includes creating certificates and managing those. It is more manual than it should be. K8s now has a built in API driven CA that could be used for some of this.

This type of set up is possible but not fun. It is also difficult to rotate certificates. Running a split contour is already an advanced move so I don’t think that this is too troublesome.

@davecheney davecheney added the blocked/needs-design Categorizes the issue or PR as blocked because it needs a design document. label Apr 10, 2019
@davecheney
Copy link
Contributor

Thanks for your feedback Joe. Back to the drawing board!

@davecheney
Copy link
Contributor

Without a viable design I have to withdraw this from the 0.12 milestone. We're going to do some preparation work on #898 but that is not guaranteed for 0.12, hence not assigned to the milestone.

@davecheney davecheney added blocked/needs-design Categorizes the issue or PR as blocked because it needs a design document. and removed blocked/needs-design Categorizes the issue or PR as blocked because it needs a design document. labels Apr 16, 2019
@davecheney davecheney removed this from the 0.12.0 milestone Apr 16, 2019
@davecheney davecheney added this to the 0.14.0 milestone Jun 4, 2019
@youngnick youngnick self-assigned this Jun 18, 2019
youngnick pushed a commit to youngnick/contour that referenced this issue Jun 20, 2019
Updates projectcontour#881
Updates projectcontour#862

Signed-off-by: Nick Young <ynick@vmware.com>
@youngnick youngnick removed the blocked/needs-design Categorizes the issue or PR as blocked because it needs a design document. label Jun 21, 2019
@youngnick
Copy link
Member

Okay, now that the design doc is merged and approved, I'll get started on doing the plumbing.

youngnick pushed a commit to youngnick/contour that referenced this issue Jun 25, 2019
Fixes projectcontour#862
Updates projectcontour#881

Signed-off-by: Nick Young <ynick@vmware.com>
@youngnick
Copy link
Member

Progress so far:

  • I've got a working contour serve that takes the three TLS params (cafile, cert-file, and key-file), and serves gRPC over TLS using them.
  • I've got contour cli able to talk to contour using similar params to that command.
  • I've got contour bootstrap able to bootstrap an Envoy that will connect to contour and work.

Questions I have for interested parties as I get started on the below Todos:

  • I'm hardcoding the SNI for the two clients (cli and Envoy) to expect a ServerName of contour. This will mean that the serving cert used by contour serve must contain a SAN of contour for the whole setup to work. Feels like a fair first-pass compromise to me. @davecheney, @stevesloka, anyone else, thoughts?
  • I'm going to add a make secure-local target to the Makefile that will expect certs to be in a certs directory (I'll add that to the .gitignore as well), and maybe a make certs target to make the local certs for you, if I have time. Thoughts on this approach?

Todo:

  • Update 'separate deployment' documentation with instructions on creating certs and putting them into the right places.
  • Update 'separate deployment' YAMLs to expect those certs in the right places.

youngnick pushed a commit to youngnick/contour that referenced this issue Jun 28, 2019
Fixes projectcontour#862
Updates projectcontour#881

Signed-off-by: Nick Young <ynick@vmware.com>
@youngnick
Copy link
Member

Oops, this issue was closed in error - the code to support this is in place, but documentation updates are not. Reopening until PR #1203 is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants